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This book series is dedicated to my wife Phullara and our 
children Sourav and Devleena 


Chittaranjan Kole 


Preface to the Series 


Genome sequencing has emerged as the leading discipline in the plant sci- 
ences coinciding with the start of the new century. For much of the twen- 
tieth century, plant geneticists were only successful in delineating putative 
chromosomal location, function, and changes in genes indirectly through 
the use of a number of “markers” physically linked to them. These included 
visible or morphological, cytological, protein, and molecular or DNA mark- 
ers. Among them, the first DNA marker, the RFLPs, introduced a revolu- 
tionary change in plant genetics and breeding in the mid-1980s, mainly 
because of their infinite number and thus potential to cover maximum chro- 
mosomal regions, phenotypic neutrality, absence of epistasis, and codomi- 
nant nature. An array of other hybridization-based markers, PCR-based 
markers, and markers based on both facilitated construction of genetic 
linkage maps, mapping of genes controlling simply inherited traits, and 
even gene clusters (QTLs) controlling polygenic traits in a large number 
of model and crop plants. During this period, a number of new mapping 
populations beyond F, were utilized and a number of computer programs 
were developed for map construction, mapping of genes, and for mapping 
of polygenic clusters or QTLs. Molecular markers were also used in the 
studies of evolution and phylogenetic relationship, genetic diversity, DNA 
fingerprinting, and map-based cloning. Markers tightly linked to the genes 
were used in crop improvement employing the so-called marker-assisted 
selection. These strategies of molecular genetic mapping and molecular 
breeding made a spectacular impact during the last one and a half decades 
of the twentieth century. But still they remained “indirect” approaches for 
elucidation and utilization of plant genomes since much of the chromo- 
somes remained unknown and the complete chemical depiction of them was 
yet to be unraveled. 

Physical mapping of genomes was the obvious consequence that facili- 
tated the development of the “genomic resources” including BAC and YAC 
libraries to develop physical maps in some plant genomes. Subsequently, 
integrated genetic—physical maps were also developed in many plants. This 
led to the concept of structural genomics. Later on, emphasis was laid on 
EST and transcriptome analysis to decipher the function of the active gene 
sequences leading to another concept defined as functional genomics. The 
advent of techniques of bacteriophage gene and DNA sequencing in the 
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1970s was extended to facilitate sequencing of these genomic resources in 
the last decade of the twentieth century. 

As expected, sequencing of chromosomal regions would have led to too 
much data to store, characterize, and utilize with the-then available com- 
puter software. But the development of information technology made the 
life of biologists easier by leading to a swift and sweet marriage of biology 
and informatics, and a new subject was born—bioinformatics. 

Thus, the evolution of the concepts, strategies, and tools of sequencing 
and bioinformatics reinforced the subject of genomics—structural and func- 
tional. Today, genome sequencing has traveled much beyond biology and 
involves biophysics, biochemistry, and bioinformatics! 

Thanks to the efforts of both public and private agencies, genome 
sequencing strategies are evolving very fast, leading to cheaper, quicker, and 
automated techniques right from clone-by-clone and whole-genome shotgun 
approaches to a succession of second-generation sequencing methods. The 
development of software of different generations facilitated this genome 
sequencing. At the same time, newer concepts and strategies were emerging 
to handle sequencing of the complex genomes, particularly the polyploids. 

It became a reality to chemically—and so directly—define plant 
genomes, popularly called whole-genome sequencing or simply genome 
sequencing. 

The history of plant genome sequencing will always cite the sequencing 
of the genome of the model plant Arabidopsis thaliana in 2000 that was fol- 
lowed by sequencing the genome of the crop and model plant rice in 2002. 
Since then, the number of sequenced genomes of higher plants has been 
increasing exponentially, mainly due to the development of cheaper and 
quicker genomic techniques and, most importantly, the development of col- 
laborative platforms such as national and international consortia involving 
partners from public and/or private agencies. 

As I write this preface for the first volume of the new series 
“Compendium of Plant Genomes,” a net search tells me that complete or 
nearly complete whole-genome sequencing of 45 crop plants, eight crop 
and model plants, eight model plants, 15 crop progenitors and relatives, and 
three basal plants is accomplished, the majority of which are in the public 
domain. This means that we nowadays know many of our model and crop 
plants chemically, i.e., directly, and we may depict them and utilize them 
precisely better than ever. Genome sequencing has covered all groups of 
crop plants. Hence, information on the precise depiction of plant genomes 
and the scope of their utilization are growing rapidly every day. However, 
the information is scattered in research articles and review papers in jour- 
nals and dedicated Web pages of the consortia and databases. There is no 
compilation of plant genomes and the opportunity of using the information 
in sequence-assisted breeding or further genomic studies. This is the under- 
lying rationale for starting this book series, with each volume dedicated to a 
particular plant. 
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Plant genome science has emerged as an important subject in academia, 
and the present compendium of plant genomes will be highly useful to 
both students and teaching faculties. Most importantly, research scientists 
involved in genomics research will have access to systematic deliberations 
on the plant genomes of their interest. Elucidation of plant genomes is of 
interest not only for the geneticists and breeders, but also for practitioners of 
an array of plant science disciplines, such as taxonomy, evolution, cytology, 
physiology, pathology, entomology, nematology, crop production, biochem- 
istry, and obviously bioinformatics. It must be mentioned that information 
regarding each plant genome is ever-growing. The contents of the volumes 
of this compendium are, therefore, focusing on the basic aspects of the 
genomes and their utility. They include information on the academic and/ 
or economic importance of the plants, description of their genomes from a 
molecular genetic and cytogenetic point of view, and the genomic resources 
developed. Detailed deliberations focus on the background history of the 
national and international genome initiatives, public and private partners 
involved, strategies and genomic resources and tools utilized, enumeration 
on the sequences and their assembly, repetitive sequences, gene annotation, 
and genome duplication. In addition, synteny with other sequences, com- 
parison of gene families, and, most importantly, the potential of the genome 
sequence information for gene pool characterization through genotyping 
by sequencing (GBS) and genetic improvement of crop plants have been 
described. As expected, there is a lot of variation of these topics in the vol- 
umes based on the information available on the crop, model, or reference 
plants. 

I must confess that as the series editor, it has been a daunting task for 
me to work on such a huge and broad knowledge base that spans so many 
diverse plant species. However, pioneering scientists with lifetime expe- 
rience and expertise on the particular crops did excellent jobs editing the 
respective volumes. I myself have been a small science worker on plant 
genomes since the mid-1980s and that provided me the opportunity to per- 
sonally know several stalwarts of plant genomics from all over the globe. 
Most, if not all, of the volume editors are my longtime friends and col- 
leagues. It has been highly comfortable and enriching for me to work with 
them on this book series. To be honest, while working on this series I have 
been and will remain a student first, a science worker second, and a series 
editor last. And, I must express my gratitude to the volume editors and the 
chapter authors for providing me the opportunity to work with them on this 
compendium. 

I also wish to mention here my thanks and gratitude to Springer staff, 
particularly Dr. Christina Eckey and Dr. Jutta Lindenborn, for the earlier set 
of volumes and presently Ing. Zuzana Bernhart for all their timely help and 
support. 

I always had to set aside additional hours to edit books beside my profes- 
sional and personal commitments—hours I could and should have given to 
my wife, Phullara, and our kids, Sourav and Devleena. I must mention that 


they not only allowed me the freedom to take away those hours from them 
but also offered their support in the editing job itself. I am really not sure 
whether my dedication of this compendium to them will suffice to do justice 
to their sacrifices for the interest of science and the science community. 


New Delhi, India Chittaranjan Kole 
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In 2005, we embarked on the Journey to develop a high-quality, refer- 
ence sequence for the bread wheat genome. The vision was to produce a 
sequence comparable in quality to the rice genome. Many told us that 
it would be impossible, and the best strategy would be a low coverage 
sequence. Yet, we persisted. We persisted because we listened to the breed- 
ers and future users of the genome sequence. They asked for a high-qual- 
ity sequence of the hexaploid bread wheat because it contributes more to 
the human diet than any other crop species and is the most widely grown 
crop in the world. With the publication of the IWGSC RefSeq v1.0 and its 
accompanying annotation in 2018, we marked the attainment of this vision. 
This journey was accomplished with contributions from scientists all over 
the globe and would not have been possible without the public-private part- 
nerships that ensued and the determination of many who never lost sight of 
the vision. 

IWGSC RefSeq v1.0 and its annotation created a paradigm shift that 
ushered in a new world for wheat breeders and scientists, and the advance- 
ments have been rapid and simply amazing. Thanks to technological 
advancements, genome sequencing of large polyploid genomes with highly 
repetitive content like wheat has now become routine and platinum qual- 
ity sequences of wheat can be delivered in weeks at a reasonable cost. New 
resources for the molecular genetic study of wheat and its application for 
wheat improvement are arriving at a record pace. 

The production of the Wheat Genome book is essential and highlights 
the groundbreaking research ongoing for this critical crop. This volume 
includes papers describing the development of the reference sequence, 
new assemblies of commercial varieties, genome-wide studies, and the 
accelerated cloning of agronomically important genes and provides valu- 
able resources and literature for fundamental and applied research, crop 
improvement and teaching. It illustrates the value and impact of having 
high-quality reference genomes for overall crop improvements that address 
the dual challenges of producing a reliable, safe, and sustainable supply of 
wheat while facing a rapidly changing climate. 
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We are indebted to our colleagues from the global wheat community who 
contributed to this book and for their continuous commitment to provide 
resources and knowledge to scientists and breeders around the world. 

Due to the significance of this material and the desire to ensure accessi- 
bility to everyone, a group Joined together to cover the costs associated with 
publishing an open access book. We are grateful to the following entities 
and individuals who contributed to the costs: 


e Rudi Appels, Kellye Eversole, and Catherine Feuillet 
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The Bread Wheat Reference 
Genome Sequence 


Jane Rogers 


Abstract 


In 2018, the International Wheat Genome 
Sequencing Consortium published a ref- 
erence genome sequence for bread wheat 
(Triticum aestivum L.). The landmark 
achievement was the culmination of a 
thirteen-year international effort focused 
on the production of a genome sequence 
linked to genotypic and phenotypic maps to 
advance understanding of traits and acceler- 
ate improvements in wheat breeding. In this 
chapter, we describe the challenges of the 
project, the strategies employed, how the pro- 
ject adapted over time to incorporate techno- 
logical improvements in genome sequencing 
and the project outcomes. 
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1.1 Introduction 

In 2018, the International Wheat Genome 
Sequencing Consortium published a reference 
genome sequence for bread wheat (Triticum 
aestivum L.). The landmark achievement was 
the culmination of a thirteen-year international 
effort focused on the production of a genome 
sequence linked to genotypic/phenotypic maps 
to advance understanding of traits and accelerate 
improvements in wheat breeding. In this mono- 
graph, we bring together contributions from col- 
leagues to highlight the advances and document 
the resources now available for wheat research 
and its relatives. 

This first chapter describes the challenges of 
developing the bread wheat reference genome 
sequence project, the strategies employed, 
how the project adapted over time to incorpo- 
rate technological improvements in genome 
sequencing and the project outcomes. The fol- 
lowing chapters include Chap. 2 for a com- 
prehensive documentation of available data 
repositories; Chap. 3 using chromosomes as a 
focus underpinning the establishment of a high- 
quality assembly; Chap. 4 on the challenge of 
the structural and functional annotation of the 
genome; Chap. 5 the wheat transcriptome and 
functional gene networks; Chap. 6 covering the 
genome-level diversity within cultivated wheats; 
Chap. 7 highlights the advances in sequenc- 
ing ancient wheat DNA; Chap. 8 examines the 
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impact of the durum wheat genome in identify- 
ing new germplasm for breeding; Chap. 9 dem- 
onstrates the use of the genome sequence to 
identify genes underpinning agronomic traits; 
Chap. 10 examines new and faster approaches 
to cloning disease resistance; Chap. 11 docu- 
ments the genome views of the CIMMYT breed- 
ing programme; Chap. 12 reviews the gene 
pools contributing to wheat genetic variation; 
Chap. 13 provides an overview of approaches 
to integrating genomics into breeding strategies; 
Chap. 14 explores pan-genomes for capturing 
new functionalities and refining wheat genom- 
ics; Chap. 15 provides insights into the exten- 
sive germplasm resources established within the 
wheat community. 


1.2 Origins of the Wheat Genome 


Project 


Since the early 1990s, there has been a growing 
realization across the world that to feed a rap- 
idly growing human population grain production 
needs to increase by an annual rate of 2% on 
an area of land equivalent to that already under 
cultivation. Wheat was one of the first domes- 
ticated food crops and continues to be the most 
important food grain source for humans today. 
Wheat is grown on a greater area than any other 
crop (approx. 255 m ha, Bonjean etal. 2016; 
https://www.fao.org/faostat/en/#data) and is best 
adapted to temperate regions of the world. 

By 2003, demand for wheat already regularly 
outstripped annual global production, and, faced 
with an estimated 2596 annual loss due to biotic 
(pests) and abiotic stresses (heat, frost, drought 
and salinity), it was clear that a paradigm shift 
was needed in wheat breeding and understand- 
ing of wheat biology to attain a sustainable food 
supply. At the time, other areas of biology were 
benefitting from access to genome data gener- 
ated through high throughput DNA sequencing 
projects. The largest genome sequence avail- 
able was the human genome sequence (3 Gb), 
for which draft and finished versions were 
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published in 2001 (Lander etal. 2001; Venter 
etal. 2001) and 2004 (International Human 
Genome Consortium 2004), respectively. The 
sequence rapidly yielded new information 
about the structure, organisation, genes, genetic 
traits and genome variation to make an imme- 
diate impact on human biology and medicine. 
The Arabidopsis thaliana genome sequence 
(ca.100 Mb) published in 2000 (The Arabidopsis 
Genome Initiative 2000) was similarly impact- 
ing understanding of genes and genetic traits in 
plants, and genome sequencing projects for rice 
(450 Mb) (Eckhardt 2000; International Rice 
Genome Sequencing Project and Sasaki 2005) 
and maize (ca 1 Gb) (Chandler and Brender 
2002) were underway. 

In November 2003, a USDA-NSF work- 
shop was convened to consider the feasibility 
and requirements of a wheat genome sequence 
Gill et al. 2004). The development of genomic 
resources for wheat lagged behind the other 
major crops due to the genome posing three 
major challenges. First, the wheat genome is 
very large. The genome size estimated from 
DNA-Feulgen studies of root tip nuclei was ca. 
17 Gb, over five times the size of the human 
genome. Second, early cytogenetic stud- 
ies established that several Triticeae species, 
including bread wheat, are polyploid and origi- 
nated from spontaneous hybridisation of diploid 
genomes (Kihara 1944; McFadden and Sears 
1946). The genome of bread wheat is allohexa- 
ploid, comprising 21 pairs of homologous chro- 
mosomes originating from three homeologous 
sets of seven chromosomes, referred to as the 
A, B and D sub-genomes. The hexaploid wheat 
genome arose from two hybridisation events, 
estimated to have taken place between 0.8 
and 0.5 million years ago and 8-10,000 years 
ago, respectively. The first hybridisation event 
occurred between a species related to Triticum 
urartu (2n—2x— 14; A"A") and one or more 
species from the Sitopsis section related most 
closely to Aegilops speltoides (2n=2x=14; 
SS), believed to be the closest living rela- 
tive to the B genome progenitor. The resulting 
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fertile tetraploid (2n—4x—28; AABB)) was 
domesticated over 10,000 years ago and devel- 
oped into emmer wheat (Triticum turgidum). 
The hybridisation of emmer wheat in a region 
south of the Caspian Sea some 8—-10,000 years 
ago with Aegilops tauschii (2n=2x=14), a 
wild diploid with a D genome, led to the fer- 
tile hexaploid with an AABBDD genome, the 
ancestral bread wheat (Zohary etal. 2012). 
This has subsequently undergone a number 
of structural and functional rearrangements, 
including slight reductions (2-109606) in the 
size of the homoeologous genomes compared 
to the diploid ancestors, to produce the stable 
genome of bread wheat of today (Feldman and 
Levy 2009). Because these events have taken 
place over a short evolutionary timescale, the 
three sub-genomes exhibit high levels homol- 
ogy, with similar gene contents and high levels 
of synteny with other grass species and diploid 
wheat relatives. These high levels of similarity 
have hampered genome sequence assembly and 
the assignment of genes or other tag sequences 
to specific sub-genomes to distinguish between 
specific variants that may have phenotypic 
importance. 

The additional challenge for sequencing 
the wheat genome is its very high repetitive 
sequence content. Early studies suggested that 
approximately 83% of the genome comprises 
transposable elements (TE) that arose from 
massive amplifications of inserted elements 
in the ancestral Triticeae genome. These have 
subsequently evolved independently in indi- 
vidual sub-genomes to give rise to characteris- 
tic quantitative and qualitative variations in the 
A, B and D genomes of modern bread wheat. 
Repeat elements have proved challenging for all 
sequence assembly algorithms, and the extent to 
which qualitative and quantitative differences 
in types of repeats and their distribution across 
the homoeologous chromosomes of hexaploid 
wheat could be or needed to be resolved to 
understand genomic function was an important 
consideration (see also Chap. 4). 

The USDA-NSF workshop participants rec- 
ognised that a high-quality reference genome 
sequence for wheat would underpin future 


wheat improvement by providing access to a 
complete gene catalogue, an unlimited number 
of molecular markers to enable genome-based 
selection of new varieties and a framework for 
the efficient exploitation of natural and induced 
genetic diversity. It would also provide insights 
into the functioning of a polyploid genome. It 
was agreed that a wheat genome project should 
focus on the hexaploid wheat variety CHINESE 
SPRING, for which resources that had been 
developed previously included large genetic 
stocks of aneuploid lines (Sears 1954, 1966) 
and sets of tag sequences, used to evaluate the 
gene content. In recognition of the complexity 
of the genome, several pilot projects were pro- 
posed to inform the development of a sequenc- 
ing strategy. These included (1) construction of 
an accurate, sequence-ready physical map based 
on ordered BAC contigs; (ii) assessment of the 
feasibility of a chromosome-based approach 
for mapping and sequencing; and (iii) explora- 
tion of different strategies for gene enrichment. 
The outcomes of these projects were evaluated 
under the umbrella of the International Wheat 
Genome Consortium (IWGSC) which was 
established in 2005. The aims of the Consortium 
focus on advancing agricultural research for 
wheat production and utilisation by developing 
DNA-based tools and resources resulting from 
the complete sequence of the hexaploid wheat 
genome. 


1.3 Wheat Genome Strategy 


Development 


The size and complexity of the bread wheat 
genome initially caused many to believe that 
determining a genome sequence would be 
impossible within a reasonable time frame and 
budget. Several projects were initiated that 
aimed to reduce the complexity by focusing on 
diploid relatives of wheat A and D genomes (T. 
urartu, Ling etal. 2013; Ling etal. 2018; A. 
tauschii, Jia et al. 2013) or by focusing only on 
the assembly of genic regions from the hexa- 
ploid wheat genome (see Chap. 4). Bread wheat 


breeders and researchers, however, realised that 
to provide the tools and resources for bread 
wheat research would ultimately require the 
genome of the hexaploid (Feuillet et al. 2016). 

The determination of the DNA sequence of 
whole genomes is achieved by piecing together 
shorter lengths of DNA sequence in the order 
and orientation in which they occur in the organ- 
ism from which the DNA was extracted. By 
2005, two main approaches to genome sequenc- 
ing had been established and were being applied 
to different genomes. 


1.3.1 The Hierarchical Shotgun 


Strategy 


This strategy is based on a two-step approach 
entailing initial construction of a physical map 
of the target genome followed by sequencing 
and assembly of short DNA fragments (typically 
500 bp-1 kb) generated from sets of overlap- 
ping clones that represent a minimal tiling path 
(MTP) across the genomic DNA. Sequences 
representing typically at least tenfold cover- 
age of each clone in paired sequence reads are 
assembled into longer pieces (contigs) using 
an assembly algorithm that identifies and joins 
matching sequences. The number of contigs 
into which each clone is assembled depends on 
a variety of factors, including clone represen- 
tation in sequence fragments, sequence depth 
and quality and the repeat content of the DNA. 
Once an initial assembly has been made fur- 
ther, directed sequencing can be undertaken to 
improve the sequence quality, close gaps and 
resolve ambiguities. Finally, sequence over- 
laps between clones are identified after removal 
of cloning and sequencing vector sequences, 
and the clone sequences are linked to produce 
a pseudomolecule representing chromosomal 
DNA. The hierarchical shotgun approach was 
used to produce the first reference sequence for 
the human genome (Lander et al. 2001) and to 
produce the first reference genome sequences 
for plants, A. thaliana (The Arabidopsis Genome 
Initiative 2000) and rice (International Rice 
Genome sequencing Project and Sasaki 2005). 
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It has subsequently been used in the produc- 
tion of reference sequences for the legume 
Medicago truncatula (Young etal. 2011) and 
to manage the complexity of the highly repeti- 
tive 3.5 Gb maize genome (Schnable et al. 
2009). By requiring prior generation of a physi- 
cal map, the hierarchical approach to genome 
sequencing increased the timespan and cost of 
genome projects. Some of the advantages, how- 
ever, were that it enabled targeted sequencing 
of regions and targeted resolution of problems, 
and it facilitated project and cost sharing by ena- 
bling distribution of mapping and sequencing 
among multiple groups. It also generated clone 
resources that have been used to sequence spe- 
cific genes or regions of interest ahead of the 
genome sequence becoming available. Until the 
very recent introduction of improved algorithms 
for short read sequence assembly (Clavijo et al. 
2017; Avni etal. 2017), accurate sequencing 
reads in excess of 15-20 kb (De Coster et al. 
2021) and the development of alternatives to 
physical maps for long-range structural organi- 
sation, such as optical maps (Keeble-Gagnére 
etal. 2018) and chromosome conformation 
capture sequencing (Hi-C, Burton et al. 2013), 
the hierarchical shotgun approach produced the 
most complete and accurate reference genome 
sequences, supporting detailed annotation and 
downstream applications in functional genomics. 


1.3.2 Whole Genome Sequencing 
(WGS) Strategy 


The WGS strategy is based on the random frag- 
mentation (shotgun fragmentation) of whole 
genome DNA, sequencing the ends of the 
fragments and assembly of the overlapping 
sequences to build up longer lengths of DNA. 
Typically, fragments of different sizes are used 
and pairs of sequences from the ends of sized 
fragments representing at least 30-fold cov- 
erage of the genome are assembled. In 1977, 
Sanger et al. (1977) reported the use of whole 
genome shotgun sequencing to assemble the 
genome of the bacteriophage $X174 (5386 bp). 
Subsequently, the approach has been used to 


1 The Bread Wheat Reference Genome Sequence 


sequence genomes of increasing complex- 
ity, including a wide variety of plants. It was 
championed in the late 1990s by C. Venter to 
sequence the genomes of Haemophilus influen- 
zae (Fleischmann et al. 1995), Drosophila mel- 
anogaster (Adams et al. 2000) and the human 
genome (Venter etal. 2001). As sequencing 
costs have fallen with the introduction of sec- 
ond-generation sequencing technologies, whole 
genome shotgun approaches were considered 
a more tractable way to access large genomes, 
particularly those of plants (Feuillet et al. 2011; 
Jackson et al. 2011). 

Factors affecting the quality of the assembly 
that can be achieved with this approach include 
the completeness and depth of coverage of the 
genome in sequence fragments, the level of bias 
in the fragmentation, cloning and sequencing 
processes caused by specific sequence motifs or 
repetitive elements, the sequence depth (num- 
ber of times each individual piece of DNA is 
sequenced) and the power of the assembly algo- 
rithm. Highly repetitive genomes are particularly 
challenging where sequence read lengths are 
shorter than the length of repeats and reads can- 
not be positioned uniquely. As a result, they are 
often not assembled in the genome, leaving gaps. 

Although the hierarchical and whole genome 
sequencing strategies have often been regarded 
as strategic competitors, they can be used to 
complement each other to achieve a more com- 
plete result. Methods to integrate whole genome 
sequence data into a BAC-based genome and 
integration of BAC sequences into a whole 
genome shotgun have been developed result- 
ing in many of the higher-quality genome 
sequences being hybrid assemblies (e.g. mouse 
(Mouse Genome Sequencing Consortium 
2002), zebrafish (Howe, et al. 2013), Drosophila 
(Celniker and Rubin 2003), Medicago (Young 
et al. 2011), maize (Schnable et al. 2009), rice 
(International Rice Genome Sequencing Project 
and Sasaki 2005) and tomato (The Tomato 
Genome Sequencing Consortium 2012)). Such 
assemblies achieve more complete coverage 
of the genome, enabling more accurate annota- 
tion, whilst still delivering resources for targeted 
improvement, gene cloning, etc. 


1.4 IWGSC Strategic Roadmap 

The IWGSC published its first roadmap for the 
bread wheat genome in 2006. The strategy pro- 
posed was based on reducing the complexity of 
the genome by generating physical maps and 
sequences for individual chromosome arms. 
This had the advantage of reducing the size of 
the assembly challenge to between 200 and 
800 Mb, comparable to the sizes of other plant 
genomes (Doležel et al. 2007). It also largely 
eliminated problems of mis-assembling similar 
regions or sequences originating from homoe- 
ologous chromosomes. This chromosome-based 
approach was dependent upon the technologi- 
cal advances in flow cytometric chromosome 
sorting developed by the group of J. Doležel 
(Institute of Experimental Botany, Czech 
Republic) (see Chap. 3.). Between 2004 and 
2013, the group flow sorted and produced BAC 
libraries representing 37 bread wheat chromo- 
some/chromosome arms. These comprised a 
single library for chromosome 3B (Šafář et al. 
2004), a composite library for chromosomes 1D, 
4D and 6D (Janda et al. 2004) and individual 
libraries for each arm of the remaining 17 chro- 
mosomes. The complete set of BAC libraries 
contains 2,713,728 clones (Safár et al. 2010). In 
2008, Paux et al. (2008) reported the construc- 
tion of the first physical map of a wheat chro- 
mosome, 3B. The map covered approximately 
8296 of the estimated size of the chromosome 
and provided a minimal tile path of physically 
mapped clones for sequencing. It also provided 
a ‘proof of principle’ for the hierarchical chro- 
mosome-based strategy to map and sequence the 
hexaploid wheat genome. Following the genera- 
tion of the first physical maps, the IWGSC con- 
tinued its focus on the production of physical 
maps for the whole genome, recruiting groups 
throughout the world to join the enterprise. In 
total, 17 groups from 14 countries contributed 
and the physical maps for all chromosomes were 
complete by January 2014. 

Throughout the course of the wheat genome 
project, the strategy and roadmap evolved to 
take account of technological advances. In 
2010, the roadmap was updated to incorporate 
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Fig. 1.1 Overview of the global community contrib- 
uting to the sequencing of the wheat genome. National 
flags indicate the country-of-origin of the research 
groups contributing to the establishment of the high- 
quality Triticum aestivum cv. CHINESE SPRING 


the generation of chromosome-based short read 
sequence data into the strategy. The data pro- 
vided the first genome-wide information about 
the distribution of genic sequences across the 
21 chromosomes and provided an intermediate 
gene catalogue for wheat research (International 
Wheat Genome Sequencing Consortium 2014). 
Two further strategic modifications were made 
in 2014 and 2016, respectively. The first ena- 
bled the integration of the physical maps with 
genome-wide sequence data by generating 
short sequence tag data from minimal tile paths 
of BACs for chromosomes mapped using the 
SNaPShot approach (see International Wheat 
Genome Sequencing Consortium 2018). The 
final update to the IWGSC wheat genome road- 
map reflected the breakthrough in sequence 
assembly software developed by NRGene 
(www.nrgene.com) and others (Clavijo etal. 
2017) which made it possible to assemble a 
whole genome sequence of bread wheat. By 
integrating a whole genome shotgun assem- 
bly with data derived from chromosomal maps 
and genetic maps, the first reference genome 


reference genome assembly (IWGSC RefSeq v1.0) 
including involvement in the flow sorting, chromosome 
shotgun, generation of additional resources and annota- 
tion. The times for the data set releases are indicated in 
blue 


sequence for hexaploid bread wheat was pro- 
duced (Fig. 1.1). 


1.5 Impact of Sequencing 
Technology Improvement 


on IWGSC Strategy 


At the time of the USDA-NSF workshop, high 
throughput DNA sequencing was in a state of 
transition. Previously, the predominant sequenc- 
ing platforms had been based on fluorescent 
dideoxy nucleotide sequencing (so-called 
Sanger sequencing) which delivered of the order 
of 350-1000 bases per sequence using auto- 
mated gel-based or capillary separation systems. 
Driven by the human genome project and other 
large genome projects, between 1994 and 2004 
the sequence accuracy and output rose to around 
1 million bases per day per instrument, but the 
cost of sequencing remained relatively high at 
ca. 0.3 USD per sequence read (500 USD per 
raw Mb). The high cost and relatively slow pace 
of sequencing meant that even medium-sized 
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genomes (500 Mb-1 Gb) required large, multi- 
year projects to produce even draft versions of 
genomes with wildly differing quality, depend- 
ing on the size and composition of repeat 
sequences. 

Around 2004, the first second-generation 
sequencing instruments began to emerge. The 
first was the 454 Life Sciences pyrosequencer 
(later acquired by Roche Diagnostics) that 
measured sequential DNA polymerase cata- 
lysed sequencing reactions in picotiter plate 
arrays (Ronaghi etal. 1998; Margulies, et al. 
2005). Early instruments generated around 
100 million bases per day from ca. 0.5 million 
sequences of up to 100 nucleotides. The output 
improved with further development to approxi- 
mately 400 million bases from sequences up to 
400 nucleotides long in a 10-h run at a cost of 
around 15 USD per raw Mb by 2009. Whilst 
the 454 brought speed and cost benefits to high 
throughput sequencing, the accuracy was lower 
than ‘Sanger sequencing’, largely due to prob- 
lems with accurate determination of bases in 
homopolymers (Metzker 2010; Mardis 2011). 
This could be accommodated and corrected to 
some extent by sequence analysis and assembly 
software, but it still caused some problems for 
some genome sequences. 

The emergence of the highly parallelised 
pyrosequencing instrumentation of 454 Life 
Sciences led the way for more ‘second-gen- 
eration’ platforms offering massively parallel 
sequencing. The most successful of these was 
developed by Solexa and subsequently com- 
mercialised by Illumina™. The platform uses 
“sequencing by synthesis’ to measure the incor- 
poration of fluorescent nucleotides into mil- 
lions of growing chains of DNA anchored to a 
glass surface which are scanned using a confo- 
cal microscope (Bennett et al. 2005). Initially, 
sequence read lengths were limited to around 30 
bases, but as the technology matured improve- 
ments in chemistry, imaging technology and 
software have reduced the sequence ascertain- 
ment bias and enabled routine collection of 
paired sequence reads up to 300 bases long from 
sized DNA fragments. As a result, rates of data 
collection rose from 300 Mb to over 100 Gb 


per day with high levels of sequence accuracy 
(Schatz 2015) and reduced the costs compared 
to Sanger sequencing by 4-5 orders of mag- 
nitude. By assembling overlapping sequences 
from paired reads derived from small fragments 
(300-400 bp), longer sequences can be built 
up that help to overcome some of the problems 
encountered in using Illumina technology to 
sequence large or repetitive genomes. There has 
also been significant investment in developing 
data management and sequence assembly pipe- 
lines in both the public and private domains to 
meet the challenges of documenting and assem- 
bling very large volumes of short read sequence 
data (see Chap. 2). These benefits have resulted 
in the Illumina technology becoming the most 
widely used second-generation technology with 
a broad range of applications including de novo 
genome sequencing, comparative genomics, 
gene expression, transcriptomics, DNA-protein 
interactions and methylation profiling. 

The earliest wheat genome-wide sequenc- 
ing projects focused on genic sequences with 
the sequencing of expressed sequence tags 
(ESTs) and cDNAs. A set of 1,073,845 EST 
sequences derived from  polyA-tailed tran- 
scripts were released by the Triticeae EST 
Cooperative in 1998 and used to produce a set 
of 40,000 Unigenes (http://www.ncbi.nlm.nih. 
gov/dbEST/dbESTsummary.html) In 2008, 
a Japanese initiative released 15,871 anno- 
tated cDNA sequences (http://trifldb.psc.riken. 
jp). Subsequently, relatively small studies of 
sequences from plasmids, from the 3B BAC 
library and from a gene-enriched methyl fil- 
tration library, were used to develop estimates 
of the gene and repeat contents of the genome 
based on 'Sanger sequencing. Low sample 
sizes and sampling bias, however, produced 
widely ranging estimates of between 36,000 and 
300,000 for gene number and a repeat content 
ranging from 68 to 86%. 

The introduction of higher throughput new 
sequencing technologies facilitated the pro- 
duction of more extensive genome-wide data 
sets. In 2012, Brenchley etal. published the 
results of analysis of 85 Gb of sequence gener- 
ated on the Roche 454 GS FLX Titanium and 


GS FLX+platforms. Around 5 million scaf- 
folds were assembled from 20 million sequence 
reads representing approximately fivefold cover- 
age of the CHINESE SPRING wheat genome. 
Although the data were highly fragmented, they 
provided 132,000 SNPs for use in genotyp- 
ing studies and estimates of the gene numbers 
at between 94,000 and 96,000 per sub-genome, 
with a repeat content of 79%. 

In 2014, the IWGSC published the results 
of Tlumina™ short read survey sequencing 
of chromosome 3B and the chromosome arms 
of the other 20 chromosomes of the wheat 
genome (IWGSC 2014). Based on between 
30-fold and 240-fold depth of sequence reads, 
sequences with contig L50s ranging from 1.8 
to 8.9kb were assembled after removal of 
repetitive sequences that could not be assem- 
bled uniquely to give an estimated coverage of 
between 0.5 and 0.8 of each chromosome. From 
the sequence analysis, 124,000 gene models 
were allocated across the chromosome arms and 
ca. 75,000 were ordered using SNP genotyp- 
ing and/or synteny with other grass genomes. 
Whilst most of the genes were incomplete and 
the data provided little or no information about 
gene duplications and pseudogenisation, nor 
the structural relationships between genes and 
repeat sequences, these analyses still provided 
the first genome-wide view of the distribution 
of wheat genes across homoeologous chromo- 
somes. They also provided sets of chromosome- 
specific markers for gene selection and future 
genome-wide analyses. 

In addition to genome surveys, the new 
sequencing technologies were used for high- 
quality sequencing. 454 sequencing technology 
was used to produce the first reference quality 
sequence of a wheat chromosome, 3B (Choulet 
etal. 2014). Sequences generated from 8452 
MTP BAC clones in pools of ten BACs using 
8 kb paired-end barcoded libraries were incorpo- 
rated into an assembly of 833 Mb with a N50 for 
the sequence scaffolds of 892 kb (i.e. half of the 
chromosome sequence is represented by scaf- 
folds greater than 892 kb). Using 2594 anchored 
SNP markers, 1358 sequence scaffolds compris- 
ing 774.4 Mb with a scaffold N50 of 949 kb 
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were used to construct a pseudomolecule repre- 
senting the 3B chromosome. Annotation of the 
chromosome with the automated Triannot pipe- 
line (Leroy et al. 2012) identified and positioned 
5326 functional genes and 1938 pseudogenes. It 
was also possible for the first time to annotate 
transposable elements and obtain a view of their 
distribution along the chromosome (Choulet 
et al. 2014). 

Having established the principle of chro- 
mosomal MTP BAC sequencing for wheat, 
the sequencing of 3B was swiftly followed by 
projects for other chromosomes. By January 
2015, MTP sequencing of 1A, 1B, 2B, 3A, 3D, 
4A, 5B, 6B, 7A, 7B and 7D was underway in 
11 countries, using predominantly [lumina™ 
sequencing to take advantage of higher through- 
put and lower costs relative to other sequencing 
platforms. A variety of strategies were employed 
to increase the contiguity of BAC sequences, 
which assembled into between 1 and 200 con- 
tigs per BAC, depending on the nature of the 
sequence, the quality and depth of the sequence 
data and the assembly software employed (see 
Chap. 3). Additional targeted efforts included 
combining sequence data from different frag- 
ment sizes (e.g. data from 500 bp to 1 kb frag- 
ments with paired-end sequences (mate pairs) 
from fragments between 1 and 10 kb), incorpo- 
ration of long read sequence data generated on 
new platforms and comparison with BioNano 
Optical maps generated for individual BACs 
from flow-sorted chromosomes (see Chap. 3). 
Many of these efforts were ultimately super- 
ceded by the whole genome assembly, but much 
of the data has contributed to the refinement of 
the whole genome sequence to produce the first 
high-quality reference genome sequence for 
bread wheat. 


1.6 Building the Reference 
Genome Sequence of Bread 


Wheat 


One of the greatest challenges for genome 
sequencing is being confident that the sequence 
accurately represents the genome in coverage 
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and in organisation along the chromosomes. 
Chromosome 3B was the first wheat chromo- 
some to achieve reference sequence quality and 
set a high standard for the rest of the genome. 
Representing more than 90% of the chromo- 
some, the BAC sequence contigs and scaffolds 
were organised along the chromosome using 
additional information derived from integrating 
chromosomal Illumina shotgun data, BAC end 
sequences and information from the physical 
map and high density genetic maps. 

As the second-generation short read sequenc- 
ing technologies became established, the 
throughput and data quality improved and the 
overall cost of data generation declined. In other 
spheres, population genetics studies were begin- 
ning to be based on whole genome comparisons, 
prompting the development of new methods for 
the rapid assembly and comparative analysis of 
increasingly large and complex genomes. Whole 
genome assemblies of hexaploid bread wheat 
based on defined sets of paired sequences gen- 
erated from the ends of sized DNA fragments 
were generated by Chapman etal. (2015) and 
Clavijo etal. (2017). These assemblies were 
greatly improved over previous assemblies cov- 
ering 8.2 Gb and 13.4 Gb, with reported N50 
contig sizes of 24.8 kb and 88.8 kb, respectively. 
The organisation of the assembled sequence 
contigs and scaffolds relied, as in the case of 
chromosome 3B on alignment to orthogonal 
genetic linkage maps. These were generated 
for wheat using the POPSEQ method enabled 
by high throughput sequencing and demon- 
strated initially in barley (Mascher et al. 2013; 
Chapman et al. 2015). 


In 2016, the IWGSC released a 
whole genome assembly of Illumina 
short read sequence data assembled with 


DeNovoMAGIC2™, software developed by 
NRGene that assembles Illumina?M short reads 
into highly accurate long, phase sequences, 
even when the data are derived from highly 
repetitive genomes. The assembled sequences 
totalled 14.5 Gb and were assigned to chromo- 
somal locations using POPSEQ data (Chapman 
etal. 2015) and a chromosome conformation 
capture (Hi-C) map constructed from Illumina 


sequence data produced from four independ- 
ent Hi-C libraries. The assembly was released 
as IWGSC WGAv0.4. It represented over 90% 
of the genome and contains over 97% of known 
genes. Additional work was undertaken to inte- 
grate IWGSCv0.4 with chromosome-based 
physical maps, Whole Genome Profiling Tags 
generated from chromosomal BAC MTPs (van 
Oeveren et al. 2011), sequenced BACs and opti- 
cal maps (available at the time for the Group 7 
chromosomes). This resulted in the IWGSC 
Reference Sequence v1.0 released in January 
2017 together with gene annotation based on 
extensive RNASeq data, annotations of trans- 
posable elements, duplicated regions and inte- 
gration of molecular markers (IWGSC 2018). 
The goal of the IWGSC wheat genome pro- 
ject was to produce an annotated reference 
genome sequence for wheat and make it avail- 
able in the public domain to underpin wheat 
research and improvement. The release of 
IWGSC RefSeq v1 and the first analyses pub- 
lished in 2018 marked the culmination of the 
project and the beginning of the next chapter 
of wheat research. Throughout the genome pro- 
ject, verified sequence data sets were released 
through the IWGSC repository hosted at INRA, 
France, GrainGenes and the major public 
sequence data repositories hosted at EBI, NCBI 
and DDBJ (see Chap. 2). New insights have 
emerged about the structure of the genome and 
the distribution of features, including genes, 
repeat sequences and regulatory factors, together 
with information about temporal and spatial tis- 
sue-specific gene expression and regulation. The 
genome sequence has prompted the development 
of new tools for population studies to identify 
genomic features associated with specific traits. 
For example, genome-wide SNP assays and 
computational platforms for analysis are being 
developed together with tools for the assembly 
and comparative analyses of multiple genome 
sequences (Chap. 6; Walkowiak et al. 2020). The 
high quality of the sequence is also enabling tar- 
geted genetic manipulation work (see Chap. 10). 
Whilst IWGSC  RefSeql represented a 
highly contiguous genome sequence covering 
approximately 9496 of the genome with contig, 
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scaffold and super-scaffold N50s of 52 kb, 7 Mb 
and 22.8 Mb, respectively, gaps remained. As 
new data becomes available, the sequence will 
be updated and improved. The first updated 
sequence, IWGSC Reference sequence v2.1 
(Zhu et al. 2021) was based on alignments to 
optical maps, refined the reference genome to 
correct the orientation of some scaffolds as well 
as filling gaps in the genome sequence. With the 
improvement in so-called third-generation long 
read sequencing technologies, further updates to 
the reference genome sequence can be expected. 
In 2020, Alonge et al. used data from IWGSC 
RefSeq vl to improve and annotate a sequence 
assembly generated from PacBio long read 
sequence data (Alonge et al. 2020). PacBio long 
read sequence data were also used to assemble 
the sequence of the bread wheat Triticum aes- 
tivum cultivar KARIEGA (Athiyannan et al. 
2022), and Oxford Nanopore long read sequence 
data were used to assemble Triticum aestivum 
cultivar RENAN (Aury etal. 2021) to enable 
functional studies of these varieties. 

The goal of the IWGSC was to produce a refer- 
ence genome sequence for bread wheat that would 
enable wheat research and breeding improve- 
ments. IWGSC RefSeq v 1 has provided an excel- 
lent foundation that is shared by the international 
wheat community for future developments. 
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Abstract 


Wheat data integration and FAIRification 
are key to tackling the challenge of wheat 
improvement. The data repositories presented 
in this chapter play a central role in generat- 
ing knowledge and allow data exchange and 
reuse. These repositories rely on international 
initiatives such as (i) the International Wheat 
Genome Sequencing Consortium (IWGSC), 
which delivers common genomics resources 
such as reference sequences, communal 
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Web-based seminars and (ii) the Wheat 
Information System (WheatIS) of the Wheat 
Initiative  (http://www.wheatis.org), which 
improves the interoperability and findability 
of the wheat data across the repositories. 


Keywords 


Wheat data - Data repositories - IWGSC - 
GrainGenes - Ensembl - FAIR 


2.1 Introduction 

According to the Food and Agriculture 
Organisation (FAO), wheat is the most widely 
cultivated crop on Earth, contributing about a fifth 
of the total calories consumed by humans (https:// 
www.fao.org/faostat/en/#data). To meet the chal- 
lenge of delivering safe, high-quality and health- 
promoting food and feed in an environmentally 
sensitive, economical and sustainable manner, it 
is generally considered that wheat improvement 
needs molecular breeding to complement more 
standard approaches. Furthermore, the efforts of 
breeding happen in a context of climate change 
but are still limited by insufficient knowledge 
and understanding of the molecular basis of cen- 
tral agronomic traits. In order to address the sci- 
entific questions related to this challenge, the 
wheat research community generates large and 
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heterogeneous datasets. The greatest value of 
these data lies in their integration to generate new 
knowledge as a result of effective sharing to allow 
transparency and openness. 

The wheat data landscape relies on reposi- 
tories centred on (i) one or multiple data types 
(such as genomics, genetics or phenomics) that 
are highly curated and integrated with a com- 
mon reference genome (e.g. the accession 
CHINESE SPRING developed by the IWGSC, 
2018), (ii) projects or community of users with 
dedicated tools to mine the data. To improve the 
FAIRness (Findable, Interoperable, Accessible, 
and Reusable, Wilkinson etal. 2016a) of the 
wheat datasets and databases, the WheatIS 
expert working group of the Wheat Initiative 
recommended standards and developed a data 
discovery tool dedicated to improve the find- 
ability of wheat data across repositories (Dzale 
Yeumo et al. 2017; Sen et al. 2020). 

In this chapter, we describe major wheat data 
repositories and tools, and how they integrate 
different types of wheat data following the FAIR 
principles. 


2.2 IWGSC Data Repository 

The International Wheat Genome Sequencing 
Consortium (IWGSC) has developed a variety 
of resources for bread wheat (Triticum aestivum 
L.) through its long-term efforts to achieve a 
high quality and functionally annotated reference 
wheat genome sequence (accession CHINESE 
SPRING). These data are available in a dedicated 
IWGSC data repository (https://wheat-urgi.ver- 
sailles.inrae.fr/Seq-Repository, Alaux et al. 2018) 
categorised by data type as shown in Fig. 2.1. 


2.2.1 Sequence Assemblies 


and Annotations 


IWGSC wheat genome sequence assemblies are 
available for download, BLAST (Altschul et al. 
1990), and display in genome browsers. The 
assembly dataset includes the draft and the refer- 
ence sequences, along with their annotations. 
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The draft survey sequence assembly (IWGSC 
Chromosome Survey Sequence (CSS) vl, 
IWGSC 2014) and the chromosome 3B refer- 
ence sequence (the first reference quality chro- 
mosome sequence obtained by the consortium, 
Choulet et al. 2014) were released in 2014, fol- 
lowed by two improved versions of the CSS 
(v2 and v3). The virtual gene order map gener- 
ated for the CSS, the POPSEQ data were used 
to order sequence contigs on chromosomes 
(Mascher et al. 2013), and mapped marker sets 
were associated with these assemblies. 

The reference sequence of the bread wheat 
genome released in 2018 (IWGSC RefSeq v1.0, 
2018) included the whole genome, pseudomole- 
cules of individual chromosomes or chromosome 
arms, scaffolds with the structural and functional 
annotation of genes, transposable elements (TEs) 
and non-coding RNAs. In addition, mapped 
markers as well as annotations supported with 
alignments of nucleic acid and protein evi- 
dence were made available. Manual annotations 
for specific gene families or regions of specific 
chromosomes (ca. 3685 genes) were included in 
the IWGSC RefSeq v1.1 annotations. This v1.1 
annotation set was updated to v1.2 by integrat- 
ing a set of 117 novel genes and 81 microRNAs 
manually curated by the wheat community fol- 
lowing guidelines provided by IWGSC. 

The improved version IWGSC RefSeq v2.1 
assembly was released in 2021 (Zhu et al. 2021), 
which relied on whole-genome optical maps and 
contigs assembled from whole-genome-shotgun 
Pacific Biosciences (PacBio) reads (Zimin et al. 
2017). Optical maps were used to detect and 
resolve chimeric scaffolds, anchor unassigned 
scaffolds, correct ambiguities in positions and 
orientations of scaffolds, create super-scaffolds 
and estimate gap sizes more accurately. PacBio 
contigs were used for gap closing, and pseu- 
domolecules of the 21 CHINESE SPRING 
chromosomes were re-constructed to develop 
this new reference sequence. The correspond- 
ing IWGSC v2.1 annotation accompanying the 
IWGSC RefSeq v2.1 assembly was also com- 
pleted. The transposable elements (TEs) in the 
resulting assembly IWGSC RefSeq v2.1 were 
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Fig. 2.1 Homepage of the IWGSC data repository hosted by the Wheat? URGI portal [Retrieved in August 2023] 


reannotated, and gene annotations were updated 
by transferring the known gene models from pre- 
vious annotations using a fine-tuned, dedicated 
strategy implemented in the Marker-Assisted 
Gene Annotation Transfer for Triticeae pipe- 
line (https://forgemia.inra.fr/umr-gdec/magatt). 
The released IWGSC Annotation v2.1 contains 
266,753 genes comprising 106,913 high-confi- 
dence genes and 159,840 low-confidence genes 
(Zhu et al. 2021). 

In addition to the bread wheat reference 
sequence, the IWGSC also sequenced the 
genome of the Turkish bread wheat elite culti- 
var SONMEZ (Nelson et al. 2005) along with 
seven diploid and tetraploid species: Triticum 


durum cv. CAPPELLI, Triticum durum cv. 
STRONGFIELD, Triticum durum cv. SVEVO, 
Triticum monococcum, Triticum urartu, Aegilops 
speltoides and Aegilops sharonensis (IWGSC 
2014). Download and BLAST services are avail- 
able for these datasets at https://wheat-urgi.ver- 
sailles.inrae.fr/Seq-Repository/Assemblies. 
More broadly, the IWGSC is responsible for 
organising workshops and seminars and making 
genomics tools available to the community (https:// 
www.wheatgenome.org/) as shown in Fig. 2.2. 
For example, the Apollo portal from national 
Australian Research Data Commons (https:// 
apollo-portal.genome.edu.au/) has been set up to 
allow the curation of the IWGSC v2.1 annotation. 
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Fig. 2.2. Summary of IWGSC activities 


2.2.2 Physical Maps and BAC Libraries 


Physical maps of the 21 bread wheat chromo- 
somes, based on high information content fluo- 
rescence fingerprinting (Nelson et al. 2005) or 
whole-genome profiling (Philippe et al. 2012) 
of flow-sorted chromosome or chromosome- 
arm specific BAC libraries, are stored and dis- 
played in a dedicated browser. The BAC clone 
assemblies were produced by IWGSC mem- 
bers using fingerprinted contigs (Soderlund 
et al. 2000) or LTC (Frenkel et al. 2010) soft- 
ware. The positions of individual BAC clones, 
markers and deletion bins were mapped onto 
physical contigs. The wheat physical map 
browser also provides a link to request the 
BAC clones from the French plant genomic 
resource centre. 


Genomics tools 
o Apollo v2.1 
o Pretzel wheat 
o Wheat exome panel and 
promoter capture panel 


2.2.3 Expression Data 


RNA-Seq expression data are available as 
read counts and transcripts per kilobase mil- 
lion mapped reads for the IWGSC RefSeq v1.1 
annotation. A transcriptome atlas developed 
from 850 RNA-Seq datasets representing 32 tis- 
sues at different growth stages and stresses were 
mapped to the IWGSC RefSeq annotations v1.0 
and v1.1 (Ramírez-González et al. 2018). 


2.2.4 Variation Data 


These datasets consist of the 1000 wheat exome 
project (He et al. 2019), whole exome capture 
and  genotyping-by-sequencing approaches 
of 62 diverse wheat lines (Jordan etal. 2015) 
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and varietal and intervarietal SNPs (Rimbert 
et al. 2018). VCF data files are downloadable, 
and the variant calls can be displayed in the 
browser  (https://wheat-urgi.versailles.inrae.fr/ 
Seq-Repository/Variations). 


2.2.5 Chromatin Accessibility 


Using a differential nuclease sensitivity assay, 
the chromatin states were investigated in the 
coding and TE-rich repetitive regions of the 
allopolyploid wheat genome.  Micrococcal 
nuclease (MNase) scores in BigWig format for 
IWGSC RefSeq v1.0 assembly are available to 
download (Jordan et al. 2020). 


2.3 Wheat@URGI 

The Wheat@URGI portal, developed by 
INRAE (French National Research Institute for 
Agriculture, Food and Environment) URGI unit, 
hosts the IWGSC data repository and GnplS, 
a dedicated information system following the 
Findable Accessible Interoperable Reusable 
(FAIR) principles: https://wheat-urgi.versailles. 
inrae.fr/ (Alaux et al. 2018; Pommier et al. 2019). 

GnpIS encompasses a set of integrated data- 
bases to manage genomic data using well-known 
tools such as BLAST, JBrowse, GBrowse and 
InterMine. An in-house database called GnpIS- 
coreDB developed by URGI to manage genetic 
and phenomic plant data, especially wheat, has 
been produced from French, European and 
international projects since 2000. A significant 
amount of this data is available as open access, 
and some project-restricted data can be obtained 
through a material transfer agreement. 

Data managed by GnpIS-coreDB include: 
genetic information (markers, quantitative trait 
loci (QTLs), germplasm, genome-wide asso- 
ciation studies (GWAS), genomic information 
(SNP discovery experiments, genotyping and 
synteny) and phenomic data. The phenomic data 
are available as whole trials including pheno- 
typic and environmental observations on well- 
identified plant material provided by reference 


sources such as European genebanks. Detailed 
descriptions of these datasets are available in 
Alaux et al. (2018) and Pommier et al. (2019), 
and Table 2.1 presents a data summary. 

The genetic and phenomic data have been 
produced from large collaborative projects such 
as BreedWheat (Paux et al. 2022) and Whealbi 
(Pont et al. 2019). 

These different types of data are linked 
within the GnpIS information system. This 
integration is organised around key data, also 
called “pivot data" as they are pivotal objects 
which allow integration between data types. 
The key objects used to link genomic resources 
to genetic data are markers and traits. Markers 
are mapped to the genome sequences and pro- 
vide information on neighbour genes and their 
function. They also have links to GnpIS-coreDB 
genetic maps, QTLs, genotyping and GWAS 
data. Traits link the genetic data to the phenomic 
data in GnpIS-coreDB and to synteny data dis- 
played by the PlantSyntenyViewer tool (Flores 
et al. 2023; Pont et al. 2013). 

The FAIRness of these data (including meta- 
data) can be summarised as follows: 


e Findability: (1) the data are searchable using 
our data discovery tools (WheatIS data dis- 
covery and FAIDARE, see below), Web 
interfaces (genome browsers), analysis tool 
(BLAST), data mining tool (WheatMine); (ii) 
digital object identifiers (DOIs) were gener- 
ated for each accession. 

e Accessibility: phenotyping data are accessi- 
ble through Breeding API (BrAPI) Web ser- 
vices (Selby et al. 2019) and file downloads. 

e [nteroperability: the data are in stand- 
ard formats (gff3, VCE, MCPD, MIAPPE, 
Papoutsoglou etal. 2020), and phenotyping 
data follow an ontology developed within 
the BreedWheat project and merged with the 
international wheat crop ontology (CO 321, 
Shrestha et al. 2012). 

e Reusability: (i) all the GnpIS tools have gen- 
eral terms of use and licence. Open access 
data including code are in CC BY 4.0; (ii) the 
data are sufficiently described to allow their 
reuse in new analysis. 
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Table 2.1 Genetic and phenomic wheat data summary hosted in the GnpIS-coreDB database of the Wheat@URGI 


portal in August 2023 
Data type Total number of data Open access Restricted 
points access 
Germplasm Taxon 56 56 0 
Accession 15,031 10,448 4583 
Genetic map Map 30 29 il 
Marker 716,745 314,390 402,355 
QTL 749 465 284 
Genotyping Experiment 2/5] 1 27) 
Sample 9556 42 9543 
Marker 680,463 0 680,463 
SNP discovery 724,132 280,321 443,811 
Phenotyping Trial 895 833 62 
Seed lot 8461 5037 3653 
Variable 405 107 301 
Observation 1,488,199 602,553 885,646 
GWAS Analysis 2013 43 1970 
Sample 3096 2361 735 
Variable BIS 3m 279 
Marker 160,774 4109 156,665 
Association 1,014,694 48,596 966,098 
2.4 GrainGenes as a result of increasingly accessible sequencing 
platforms, advanced assembly algorithms and 
The  GrainGenes repository (ttps:/wheat. annotation pipelines (https://wheat.pw.usda.gov/ 


pw.usda.gov, Yao et al. 2022, Fig. 2.3) is a digi- 
tal platform and a community service provider 
that has been continuously supported by U.S. 
congressional funds since 1992 through the U.S. 
Department of Agriculture. Its stakeholders are 
primarily global small grain research commu- 
nities who work on wheat, barley, rye and oat 
(Blake etal. 2022). Unlike many other small 
grain repositories, GrainGenes has decades-worth 
of genetic data: GrainGenes contains rich, peer- 
reviewed, curated data content (Odell et al. 2017), 
ranging from genetic to genomic, phenotypic to 
traits, and people to publications, with a myriad 
of search and visualisation tools to enhance data 
findability and information discovery. GrainGenes 
also provides services, such as the GrainGenes 
email list and a Twitter feed, for small grain com- 
munities through communicating community 
announcements, open positions, upcoming con- 
ference information and grant opportunities. 

The range of genome browsers at GrainGenes 
for wheat-related species attest the data growth 


GG3/genome browser). GrainGenes, in addi- 
tion to IWGSC's CHINESE SPRING v1 and v2 
assemblies, houses assemblies and annotations 
for Aegilops longissima, A. speltoides, A. shar- 
onensis, five Aegilops tauschii accessions, wild 
emmer (ZAVITAN) and durum wheat SVEVO, 
as well as Triticum aestivum genomes from 
the 10-- Wheat Genome project and the hexa- 
ploid wheat pangenome. The genome brows- 
ers at GrainGenes are shared with the Triticeae 
Toolbox (T3) database for the benefit of small 
grain researchers. 

In its IWGSC CHINESE SPRING v1 genome 
browser, GrainGenes has many tracks overlapped 
with the IWGSC's data depository at Wheat? 
URGI and Ensembl Plants. In addition to those 
tracks, T3 created several tracks for variants, 
genome-wide association studies (GWAS), 
primers and quantitative trait loci (QTLs). The 
GrainGenes team created the guanine-quadruplex 
(G4) track, for this newly emergent transcription 
regulation element class (Cagirici and Sen 2020). 
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Fig. 2.3 Homepage of GrainGenes (https://wheat.pw.usda.gov) [Retrieved in July 2022] 


Some of GrainGenes’ genome browsers over- 
lap with the genome browsers at other reposito- 
ries such as Wheat? URGI or Ensembl Plants. 
This duplication of displays is not in excess, 
but ultimately serve the interest of small grains 
researchers, because having the same datasets 
at multiple repositories allows users to harness 
different tools built on top of these datasets, 
for example, BLAST services at GrainGenes 
(https://wheat.pw.usda.gov/blast/) or the Ensembl 
Variant Effect Predictor at Ensembl Plants. 

One of the added values of using genome 
browsers at GrainGenes is their integration 
with the BLAST service at GrainGenes. When 
users run their nucleotide/protein sequences at 
GrainGenes, the results are linked to hit regions 
in the browsers, which allow users to go to those 
regions with a single mouse click. GrainGenes 
also uniquely allows rubber banding selection 
of a genome region on its JBrowse-based brows- 
ers for automatic copy pasting of underlying 
sequence data for subsequent BLASTing. 

Those who are not familiar with genome 
browser operations and their relationship to 


other pages at GrainGenes can benefit from the 
several YouTube tutorial videos that were cre- 
ated by the GrainGenes team. This is especially 
useful for those who would like to learn how to 
jump from genomic to reach genetic data, and 
vice versa in GrainGenes. The videos are linked 
at https://wheat.pw.usda.gov/GG3/tutorials. 


2.5  EnsemblPlants 

The Ensembl Plants platform (https://plants. 
ensembl.org) provides a Web browser, data- 
bases, tools and programmatic access to inte- 
grated public genomic data for a breadth of 
plant species (Cunningham et al. 2022, Fig. 2.4). 
Ensembl Plants imports genomes and commu- 
nity gene annotations into the platform, anno- 
tates genomic repeat regions, imports variation 
data and identifies homologues via Ensembl’s 
comparative genomics analysis pipeline. Users 
can access bioinformatics tools such as BLAST 
(Altschul etal. 1990) for sequence similar- 
ity searching or the Ensembl Variant Effect 
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Predictor (VEP, McLaren et al. 2016) to predict 
the functional consequences of variants. 

The first version of the IWGSC Chromosome 
Survey Sequence (CSS) and gene annotation for 
the cultivar CHINESE SPRING was made avail- 
able in Ensembl Plants in 2014. At that time 
there were three other triticeae genomes also 
included: A. tauschii, Hordeum vulgare and T. 
urartu. The TGACv1 whole-genome assembly 
(Clavijo et al. 2017) which used the CSS reads 
to assign scaffolds to chromosome arms became 
available via Ensembl Plants in 2015 and was 
subsequently replaced by the release of IWGSC 
RefSeq v1.0 in 2018, although all assemblies can 
still be accessed via Ensembl’s archive sites. As 
of April 2023, Ensembl Plants contains an addi- 
tional 17 bread wheat cultivar genomes from the 
10+ project (Walkowiak et al. 2020, https://www. 
wheatinitiative.org/10-wheat-genome-project), 
making 26 triticeae genomes in total. Each of 
the bread wheat cultivars displays the annota- 
tion from IWGSC RefSeq v1.1 projected onto 
the cultivar assembly. In addition, de novo genes 
predicted by the Plant Genome and Systems 
Biology Group (PGSB) at Helmholtz, Munich 
and the Earlham Institute (EI) for the nine chro- 
mosome-level assemblies are also displayed. 

In addition to genome annotations, Ensembl 
Plants also displays variation data, primarily 
from the 35 K and 820 K Axiom SNP breed- 
ers array, as provided by CerealsDB (Wilkinson 


EnsemblPlants + «wn m 
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et al. 2016b) and also EMS mutations mapped 
from the EMS TILLing populations (Krasileva 
etal. 2017) maintained by JIC's SeedStor 
(https://www.seedstor.ac.uk) for CADENZA 
(hexaploid bread wheat) and KRONOS (tetra- 
ploid durum wheat). This allows users to visu- 
alise where variants are located with respect 
to the IWGSC genome, and where those vari- 
ants occur in the proximity of gene models the 
Ensembl Variant Effect Predictor will provide 
estimates of the likely impact of those variants 
on predicted gene and protein sequences. This 
helps users to identify those variants most likely 
to cause disruption to genes, and Ensembl Plants 
also provides a route to connect to SeedStor 
to order materials from the EMS populations 
which have those variants. 

Ensembl’s comparative genomics pipelines 
(Cunningham et al. 2019) provide gene/protein 
trees based on sequence homology and whole- 
genome alignments (WGA) between the major- 
ity of species within the platform. The IWGSC 
v1.0 assembly has gene trees and WGA avail- 
able which allow users to explore gene family 
loss and expansions, identifying orthologues 
and paralogues and regions of synteny between 
genomes in Ensembl Plants. The 10+ wheat cul- 
tivars have wheat-specific gene trees available 
which provide a mechanism for users to explore 
gene conservation within the current bread 
wheat pan-genome (Fig. 2.5). 


Wheat assembies 
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Archive sites. 


Fig. 2.4 Homepage of Ensembl Plants (https://plants.ensembl.org) [Retrieved in August 2023] 
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Fig. 2.5 Cultivar comparative gene tree for gene TraesCS3D02G273600, a heat shock protein located on chromo- 
some 3D in IWGSC CHINESE SPRING v1.0, shown in red [Retrieved in September 2022] 


Ensembl (Cunningham et al. 2022) provides 
user access via Web-based searches through 
the Ensembl browser or BioMart (which allows 
structured user querying to select subsets of 
data), FTP download access to complete sets of 
sequence data, annotations, gene trees and data- 
bases and programmatic access via Ensembl’s 
APIs. All Ensembl data and tools are open 
access and freely available, and extensive docu- 
mentation, training materials (https://training. 
ensembl.org) and a helpdesk are available to 
support user access. Ensembl Plants can also be 
accessed through the Gramene resource (https:// 
www.gramene.org, Tello-Ruiz et al. 2022). 


2.6 | Some Other Repositories 

It is beyond the purview of this chapter to pro- 
vide all available wheat repositories worldwide, 
but the following are extremely valuable sites 
that we will mention briefly. Reading the pub- 
lications for these repositories will be useful to 
learn more about their data content and features. 


The Triticeae Toolbox (T3) (https://wheat. 
triticeaetoolbox.org, Blake etal. 2016). T3's 
mission is to create tools for researchers that 
work on genotypic-phenotypic relationships. 
As such, T3 played a centralised role in past 
projects with a strong breeding focus, such 
as Triticeae Coordinated Agricultural Project 
(TCAP) in the past, and, currently, in the Wheat 
Coordinated Agricultural Project (WheatCAP), 
both funded by the U.S. Department of 
Agriculture, National Institute of Food and 
Agriculture. 

T3 houses many Web-based tools for breed- 
ers. It has capabilities that allow users to (1) 
upload their raw genome-wide association 
(GWAS) and genotype-by-sequencing data- 
sets onto the Website, (2) perform computa- 
tions such as principal component analyses and 
(3) visualise histograms for phenotypic obser- 
vations, screen-plots of principal component 
eigenvalues, Q-Q plots displaying observed 
and expected -log,, p-values and Manhattan 
plots. In addition, T3 provides Web-based 
tools to generate selection indexes for multiple 
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traits simultaneously, which is a useful method 
for breeding programs to select and advance 
germplasms. As mentioned in the previous sec- 
tion, T3 has a very close collaboration with 
GrainGenes. Both databases maintain and share 
the same genome browsers, which enable users 
to go back and forth between two databases 
seamlessly. 

Gramene (https://www.gramene.org, Tello- 
Ruiz etal. 2021). Gramene offers a rich data 
content and a wide range of tools for com- 
parative functional genomics for 118 refer- 
ence genomes and 124,010 gene family trees 
(Release #65, May 2022). These genomes 
encompass a wide range of species, including 
various accessions of wheat (similar to other 
databases discussed in this chapter). Gramene 
is also the home of the Plant Reactome portal 
(Gupta etal. 2022), which contains pathways 
information and gene expression displays for 
106 species. Gramene has a close partnership 
with Ensembl Plants and displays genomes, 
gene models, variations and annotations collabo- 
ratively. In addition to multiple visualisation and 
analysis tools, such as Ensembl genome brows- 
ers, BLAST and FTP download, it also houses 
the Ensembl-Compara-based GeneTrees visual- 
iser tool for sequence-based protein family clas- 
sification (Vilella et al. 2009). 
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2.7 . WheatlS Data Discovery 

An expert working group of the international 
Wheat Initiative has built an international wheat 
information system, called WheatIS, with the 
aim of providing Web-based one-stop access to 
all available wheat data resources, bioinformat- 
ics tools and recommended standards (http:// 
wheatis.org/, Dzale Yeumo et al. 2017; Sen et al. 
2020). The data repositories described in this 
chapter are major data providers of the WheatIS 
federation that facilitate the availability of 
genomic, genetic and phenomic data to the com- 
munity using a data discovery tool. This tool 
developed by INRAE-URGI is a search engine 
that indexes the metadata of each database of the 
federation and provides links back to the source 
repositories. Long-term sustainability has been 
achieved through a close collaboration with 
the ELIXIR European infrastructure for Life 
Science to develop a common data discovery 
tool usable both for WheatIS and for ELIXIR 
(FAIDARE, FAIR Data-finder for Agricultural 
REsearch, https://urgi.versailles.inrae.fr/faidare/) 
extended to all plants data. 

Figure 2.6 and Table 2.2 present the wheat 
resources queried by the WheatIS data discovery 
tool in August 2023: https://urgi.versailles.inrae. 
fr/wheatis/. 
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Fig. 2.6 Wheat resources queried by the WheatIS data discovery tool 
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Table 2.2 Number of data per wheat resource indexed by the WheatIS data discovery 


Resource Institution Number of indexed 
data 
TERRA-REF U.S. Department of Energy 284 
Wheat@URGI INRAE-URGI 19,844,409 
(including IWGSC data repository and GnpIS) 
GrainGenes USDA-ARS 23,309 
(including Wheat Gene Catalogue at Komugi) 
Ensembl Plants (including EVA) EMBL-EBI 3,071,899 
The Triticeae Toolbox Triticeae CAP 223,013 
(including UniProt) 
Gramene CSH, OSU 229,851 
AgroLD SouthGreen 137,060 
CIMMYT publications and datasets CIMMYT 1,788 
CR-EST, GBIS and MetaCrop IPK 250,877 
CrowsNest PGSB 13,324 
KnetMiner Rothamsted Research 108,474 
PlantPhenoDB IPG PAS 6 
Wheat pangenome UWA 167,167 


ARS Agricultural Research Service; EMBL European Molecular Biology Laboratory; EBI European Bioinformatics 
Institute; JVRAE French National Research Institute for Agriculture, Food and Environment; URGI Research Unit 
in Genomics and Bioinformatics; USDA U.S. Department of Agriculture, Triticeae CAP Triticeae Coordinated 
Agricultural Product; CSH Cold Spring Harbor Laboratory; OSU Ohio State University; CIMMYT International 
Maize and Wheat Improvement Center; IPK Leibniz Institute of Plant Genetics and Crop Plant Research; PGSB 
Plant Genome and Systems Biology; IPG PAS Institute of Plant Genetics of the Polish Academy of Sciences; UWA 


University of Western Australia 


2.8 Conclusion 

In a context of increasingly dispersed and 
numerous wheat data production, the data inte- 
gration and FAIRification are fundamental. The 
resources detailed in this chapter contribute to 
facilitating data discovery by helping research- 
ers and breeders to use genetic and genomic 
information to improve wheat varieties. The 
involvement of the wheat bioinformatics com- 
munity in global initiatives, such as AgBioData, 
ELIXIR or Research Data Alliance for an open 
science through standardisation, requires a 
long-term commitment in order to continue 
to contribute to research and plant breeding 
worldwide. 
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Wheat Chromosomal Resources 
and Their Role in Wheat Research 


Hana Simková, Petr Cápal and Jaroslav Dolezel 


Abstract 


Bread wheat (Triticum aestivum L.) is grown 
on more area of land than any other crop, 
and its global significance is challenged only 
by rice. Despite the socioeconomic impor- 
tance, the wheat genome research was lag- 
ging behind other crops for a long time. It 
was mainly a high complexity of the genome, 
polyploidy and a high content of repeti- 
tive elements that were laying obstacles to 
a thorough genome analysis, gene cloning 
and genome sequencing. Solution to these 
problems came in the beginning of the new 
millennium with the emergence of chromo- 
some genomics—a new approach to study- 
ing complex genomes after dissecting them 
into smaller parts—single chromosomes or 
their arms. This lossless complexity reduc- 
tion, enabled by flow-cytometric chromo- 
some sorting, reduced the time and cost of 
the experiment and simplified downstream 
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analyses. Since the approach overcomes dif- 
ficulties due to sequence redundancy and the 
presence of homoeologous subgenomes, the 
chromosomal genomics was adopted by the 
International Wheat Genome Sequencing 
Consortium (IWGSC) as the major strat- 
egy to sequence bread wheat genome. The 
dissection of the wheat genome into single 
chromosomes enabled the generation of chro- 
mosome survey sequences and stimulated 
international collaboration on producing a 
reference-quality assembly by the clone-by- 
clone approach. In parallel, the chromosomal 
resources were used for marker develop- 
ment, targeted mapping and gene cloning. 
The most comprehensive approaches to gene 
cloning, such as MutChromSeq and assembly 
via long-range linkage, found their use even 
in the post-sequencing era. The chapter pro- 
vides a two-decade retrospective of chromo- 
some genomics applied in bread wheat and 
its relatives and reports on the chromosomal 
resources generated and their applications. 
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Abbreviations 

BAC Bacterial artificial chromosome 

CSS Chromosome survey sequence 

CS cv. Chinese Spring 

FISH Fluorescence in situ hybridization 

FISHIS Fluorescence in situ hybridiza- 
tion in suspension 

HICF High information content 
fingerprinting 

HMW DNA High molecular weight DNA 

IWGSC International Wheat Genome 
Sequencing Consortium 

MDA Multiple-displacement 
amplification 

MTP Minimal tiling path 

MutChromSeq Mutant Chromosome Sequencing 

OM Optical map 

TE Transposable element 

TACCA TArgeted |= Chromosome-based 
Cloning via long-range Assembly 

WGP Whole genome profiling 

3.1 Development of Wheat 


Chromosome Genomics 


The development of DNA sequencing technique 
by Sanger et al. (1977) marked the beginning of 
genomics with a prospect of obtaining complete 
genome sequences and studying entire genomes. 
The progress in DNA sequencing and genome 
assembly technologies, which followed the pio- 
neering projects on small bacterial genomes 
(Fleischmann etal. 1995; Fraser et al. 1995), 
made it possible to deliver the first genome of 
a plant—Arabidopsis thaliana (Arabidopsis 
Genome Initiative 2000), followed by Oryza 
sativa (International Rice Genome Sequencing 
Project 2005). Together with the progress in 
human genome sequencing (Lander et al. 2001) 
these achievements stimulated the interest to 
produce genome sequence of hexaploid bread 
wheat (Triticum aestivum, 2n=6x=42), one 
of the three most important crops worldwide. 


H. Šimková et al. 


This was a daunting task at that time given its 
genome size exceeding 15 Gb (IWGSC 2018), 
presence of three homoeologous genomes and 
high repeat content. 

Despite the difficulties foreseen, participants 
of the workshop on wheat genome sequenc- 
ing held in Washington DC in 2003 agreed on 
a need for a bread wheat genome sequence 
(Gill et al. 2004). Among available strategies, it 
was decided to explore the use of DNA librar- 
ies prepared from individual chromosomes and 
chromosome arms for the assembly of a global 
physical map and chromosome sequencing. 
As individual chromosomes and chromosome 
arms represent only about 4-6% and 1-3% 
of the bread wheat genome, respectively, dis- 
secting the genome to chromosomes or even 
chromosome arms offered a dramatic and 
lossless reduction in DNA sample complex- 
ity to facilitate targeted development of DNA 
markers, gene mapping and cloning as well as 
genome sequencing. The chromosome-based 
approach avoided problems due to the presence 
of homoeologous DNA sequences and enabled a 
division of labor so that different groups could 
work on physical mapping and sequencing dif- 
ferent chromosomes simultaneously (Gill et al. 
2004). A principal condition for the application 
of this approach was the ability to purify par- 
ticular chromosomes and chromosome arms in 
sufficient numbers (-10?-109) so that enough 
DNA may be obtained. Until today, the only 
method suitable for this task is flow-cytometric 
sorting. 


3.1.1 Flow Cytogenetics 

Unlike microscopy, flow cytometry analyzes 
condensed mitotic metaphase chromosomes 
during their movement, one after another, in 
a narrow liquid stream. To distinguish this 
approach from microscopic analysis, the term 
flow cytogenetics has been coined. Prior to flow 
cytometry, chromosomes are stained by a DNA 
fluorochrome so that they can be classified 
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according to relative DNA content. The analysis 
can be performed at rates of -10? s so that large 
numbers of chromosomes can be interrogated to 
obtain statistically accurate data and potentially 
discriminate individual chromosomes. À histo- 
gram of DNA content thus obtained is termed 
flow karyotype, and ideally, each chromosome 
is represented by a well-discriminated peak. In 
fact, the extent to which the chromosome peak is 
discriminated from peaks of other chromosomes 
determined the purity in the sorted fraction, or 
the frequency of contaminating chromosomes 
in flow-sorted fraction. Not all flow cytometers 
are equipped by a sorting module, and only 
some are designed to physically separate (sort) 
microscopical particles with particular optical 
parameters. Gray et al. (1975a, b), Stubblefield 
et al. (1975) and Carrano et al. (1976) were the 
first to confirm that flow cytometry can be used 
not only to classify mammalian chromosomes 
according to DNA content, but also to sort them. 
These experiments paved the way to the use 
of flow-sorted chromosomes during the initial 
phases of human genome sequencing (Van Dilla 
and Deaven 1990). 

The samples for flow cytometry must have 
a form of a concentrated suspension of intact 
chromosomes. In contrast to animals and 
human, their preparation in plants is hampered 
by low frequency of dividing mitotic cells and 
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by the presence of a rigid cell wall. A success- 
ful approach has been to artificially induce cell 
cycle synchrony in root tips of hydroponically 
grown seedlings, accumulate dividing cells at 
mitotic metaphase and release intact chromo- 
somes from formaldehyde-fixed root tips by 
mechanical homogenization. This high-yielding 
procedure was developed for faba bean (DoleZel 
etal. 1992), and by optimizing it for wheat, 
Vrana etal. (2000) set a foundation for using 
flow-sorted chromosomes in wheat genomics 
(Figs. 3.1 and 3.2). 


3.1.2 Chromosome Sorting in Wheat 


The study of Vrána and co-workers (Vrána et al. 
2000) revealed that out of the 21 chromosomes 
of bread wheat, only chromosome 3B could be 
discriminated from other chromosomes and 
sorted at high purity (Fig. 3.3a). The remaining 
chromosomes formed three composite peaks 
on a flow karyotype, each of them represent- 
ing three to ten chromosomes, which could be 
only sorted as groups. In order to determine 
chromosome content in the flow-sorted frac- 
tions, samples of ~10° chromosomes were sorted 
onto a microscopic slide and microscopically 
identified after fluorescence insitu hybridiza- 
tion with probes giving chromosome-specific 


2014 
2000 2004 Chromosome survey 2016 — 
Chromosome Chromosomal sequences e Gene cloning by 
1 flow sorting BAC library 2011 completed T MutChromSeq 

; $ Chromosome 
in wheat construction 2018 

survey 2013 T 

: c e @ Wheat reference 
2008 sequencing FISHIS- genome 
Chromosome MDA assisted published 


sorting 


2017 
Gene cloning by 
® TACCA 


2013 


2003 2008 Chromosomal l 2016 
& Preparation BAC-based BAC libraries 2014 9 Chromosomal 
chromosomal completed SARE optical map 
of HMW DNA à BAC-based 
physical map © chromosome 


sequence 


Fig.3.1 Major developments in wheat chromosomal genomics 


30 


in agarose plugs 


= 
DNA preparation ge 


H. Šimková et al. 
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in solution 


Chromosome assemblies 


Fig. 3.2 Applications of wheat chromosomal resources. 
Depending on downstream application, flow-sorted chro- 
mosomes can be processed by two distinct approaches. 
For applications with high demand on DNA amount 
and contiguity, i.e., BAC libraries, optical mapping and 
TArgeted Chromosome-based Cloning via long-range 


labeling patterns (Fig. 3.3e; Kubaláková et al. 
2002). The study of Vrána et al. (2000) indi- 
cated the suitability of chromosomal stocks 
with altered chromosome sizes for purification 
of other chromosomes than 3B. In two culti- 
vars of wheat, the authors identified and sorted 
translocation chromosome 5BL.7BL, which is 
larger than chromosome 3B (Fig. 3.3c). A sub- 
sequent study of Kubaláková et al. (2002) con- 
firmed the potential of cytogenetic stocks. The 
most important observation concerned the abil- 
ity to sort any single chromosome arm, either in 
the form of a telosome or isochromosome. As 
almost all telosomic lines were developed in the 
background of cv. CHINESE SPRING (Sears 
and Sears 1978), their use offered a possibility 
to analyze the wheat genome chromosome-by 


Gene cloning 


Marker development 


Assembly (TACCA), high molecular weight (HMW) 
DNA is prepared by purifying chromosomes embedded 
in agarose plugs. Low molecular weight (LMW) DNA, 
to be used for short-read sequencing or DArT marker 
development (DNA microarrays), is obtained after treat- 
ing chromosomal DNA in solution 


-chromosome. In 13 double-ditelosomic lines, 
both chromosome arms could be discriminated 
and sorted simultaneously (Fig. 3.3b), saving 
time to collect DNA from both arms (DoleZel 
et al. 2012). 

While this advance made chromosome flow 
sorting technology ready to support various 
genomics analyses in bread wheat (Fig. 3.2), 
including genome sequencing, its dependence 
on cytogenetic stocks limited its potential for 
marker development and gene cloning in other 
wheat genotypes. To overcome this obstacle, 
Giorgi etal. (2013) developed a protocol for 
fluorescent labeling repetitive DNA of chromo- 
somes using fluorescence in situ hybridization 
in suspension (FISHIS). Chromosome classifi- 
cation based on two fluorescence parameters: 


3 Wheat Chromosomal Resources and Their Role in Wheat Research 31 


Chinese Spring 


Count 


7DL 7DS 
S<. 


100 150 200 250 300 350 


0 50 


50 100 


7D ditelosomic line 


Arina 


150 200 


Relative DAPI fluorescence 


Certo 
8 2A 
< 2B 
š "n 
8 S 


100 


50 100 150 200 250 


Certo 


Relative FITC fluorescence 


Relative DAPI fluorescence 


Fig. 3.3 Flow karyotyping of bread wheat. Histograms 
of relative DAPI fluorescence intensities representing 
chromosomes of varying sizes are termed flow karyo- 
types. a Flow karyotype of cv. CHINESE SPRING con- 
sists of three composite peaks, harboring 3, 7 and 10 
chromosomes, respectively, and a standalone peak repre- 
senting the largest wheat chromosome 3B. b Flow karyo- 
type of 7D double ditelosomic line, where both the long 
and the short arm of chromosome 7D are discriminated 
and can be sorted simultaneously. c The translocated 
chromosome 5BL.7BL, present in cv. ARINA and some 
other cultivars, is the largest one in the karyotype and can 


DNA (after staining by a DNA fluorochrome) 
and fluorescence of regions containing DNA 
repeats (typically GAA microsatellites) labeled 
by FITC enabled discrimination of chromo- 
somes with the same or very similar DNA 
content from each other. Depending on geno- 
type, bivariate flow karyotyping after FISHIS 
typically allows discrimination of ~13 out of 21 
wheat chromosomes (Fig. 3.3d, e) and provides 
to date the most powerful approach to dissect 
the wheat genome to single chromosomes. 


be sorted with a high purity. d Standard monoparametric 
flow karyotype of cultivar CERTO, where three chro- 
mosomes from composite peak III—2A, 2B and 6B— 
form a defined but still unresolvable sub-population. e 
Bivariate flow karyotype of the same cultivar, where the 
difference in relative abundance of GAA repeat motif 
allows further discrimination of these chromosomes and 
results in well-defined populations containing a single 
chromosome type each. The chromosome 2B, shown in 
the inset, can be sorted with purity exceeding 85%. For 
the purity check, FISH was done with probes for GAA 
(green) and Afa repeats (red) 


If the FISHIS procedure of Giorgi et al. 
(2013) is not compatible with a downstream 
application of sorted chromosomes and, at the 
same time, appropriate cytogenetic stocks are 
not available, the option is to partition compos- 
ite peaks as observed on monovariate flow kary- 
otypes (Fig. 3.3a) (Vrána et al. 2015). Although 
this approach does not allow discrimination and 
sorting of single chromosomes, it is suitable 
for obtaining sub-genomic fractions compris- 
ing only a few chromosomes, with one of them 
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being more abundant. Vrána et al. (2015) cal- 
culated a so-called enrichment factor defined as 
the relative proportion of chromosomal DNA in 
the wheat genome to the proportion of chromo- 
somal DNA in a sorted fraction and found that 
a fivefold enrichment was obtained for 17 out 
of 21 wheat chromosomes. Importantly, sub- 
genomic fractions for 15 out of the 21 chromo- 
somes were not contaminated by homoeologs. 


3.1.3 Sorting Chromosomes of Wild 
Wheat Relatives 


The method for flow-cytometric chromosome 
analysis and sorting, originally developed for 
hexaploid bread wheat and subsequently modi- 
fied for tetraploid durum wheat Triticum turgi- 
dum Desf. var. durum, 2n = 4x = 28 (Kubaláková 
etal. 2005) was also found to be suitable to 
sort chromosomes from their wild relatives. In 
fact, two options were explored. One involved 
sorting chromosomes from alien chromosome 
introgression lines of wheat. The samples are 
prepared from synchronized wheat root tips and, 
if the alien chromosome can be discriminated 
on a flow karyotype, it may be sorted (Molnár 
et al. 2011, 2015; Zwyrtková et al. 2022). In a 
similar manner, wheat chromosomes carrying 
introgressions from wild relatives can be puri- 
fied (Tiwari et al. 2014; Janáková et al. 2019; 
Bansal et al. 2020). Second and straightforward 
option is to sort chromosomes directly from 
wild relatives. Thus, the protocol of Vrána et al. 
(2000) for wheat has been optimized for a vari- 
ety of species from Aegilops, Agropyron and 
Haynaldia (Dasypyrum) genera (summarized 
in Dolezel et al. 2021). While in some of them 
(like Aegilops comosa), all chromosomes may 
be discriminated and sorted (Said et al. 2021), in 
majority of species (including Aegilops genicu- 
lata, Aegilops biuncialis, Aegilops cylindrica, 
Haynaldia villosa, Agropyron cristatum and oth- 
ers) their chromosomes can only be sorted in 
groups of two to five (Molnár et al. 2011, 2015; 
Grosso et al. 2012; Said et al. 2019). As in case 


H. Šimková et al. 


of wheat, fluorescent labeling of chromosomes 
by FISHIS prior to flow cytometry increased the 
number of chromosomes that could be discrimi- 
nated and sorted. Availability of separated chro- 
mosomes of the relatives enabled comparative 
studies with the bread wheat genome (Molnár 
et al. 2014, 2016) and have been applied to sup- 
port cloning of genes from the tertiary gene pool 
(see Sect. 3.5.1). 


3.2 Toward Bread Wheat 


Reference Genome 


Need for a quality bread wheat genome that 
would provide access to the complete gene 
catalogue, an unlimited amount of molecu- 
lar markers to support genome-based selec- 
tion of new varieties and a framework for the 
efficient exploitation of natural and induced 
genetic diversity (Choulet etal. 2014a) stimu- 
lated the establishment of the International 
Wheat Genome Sequencing Consortium, a col- 
laborative platform launched in 2005 (https:// 
www.wheatgenome.org). By that time, a proven 
strategy to obtaining high-quality reference 
sequences of large genomes was the clone- 
by-clone approach, i.e., sequencing clones 
from large-insert DNA libraries ordered in 
physical maps. These constituted a technol- 
ogy-neutral resource for accessing complex 
genomes, enabling possible resequencing of 
the ordered clones by more advanced technolo- 
gies. Considering the ability to dissect the wheat 
genome to individual chromosomes or chro- 
mosome arms (Vrána etal. 2000; Kubaláková 
etal. 2002), and after confirming the feasibil- 
ity of constructing large-insert DNA libraries 
from the flow-sorted chromosomes (Safár et al. 
2004; Janda etal. 2004), the Consortium set- 
tled on coupling the chromosome purification 
with the clone-by-clone strategy and producing 
clone-based physical maps of individual wheat 
chromosomes that would allow the engagement 
of multiple teams in the challenging sequencing 
effort. 
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Generation of Chromosomal 
BAC Resources 


3.2.1 


The prerequisite of the proposed strategy was 
the ability to separate by flow sorting each 
of bread wheat chromosomes or chromo- 
some arms. This was only possible in cultivar 
CHINESE SPRING (CS), for which a com- 
plete set of telosomic lines, essential to sort the 
chromosome arms, was available (Sears and 
Sears 1978), predestining the cultivar to become 
the reference genome of bread wheat. The pri- 
mary resource needed to construct a clone- 
based physical map is a large-insert genomic 
DNA library, commonly cloned in the bacterial 
artificial chromosome (BAC) vector, typically 
bearing inserts of 100—200 kb. To generate a 
library of these parameters, several micrograms 
of high molecular weight (HMW) DNA are 
needed. Achieving this from the flow-sorted 
material involved the elaboration of a custom- 
ized protocol (Simková etal. 2003) including 
DNA preparation in agarose plugs (Fig. 3.2), 
which enabled cumulating samples from mul- 
tiple sorting days. Based on this advance, Šafář 
etal. (2004) constructed the first-ever chro- 
mosome-specific BAC library in a eukaryotic 
organism. The library, prepared from two mil- 
lion 3B chromosomes flow-sorted over 18 work- 
ing days, comprised 67,968 clones with 103 kb 
average insert size, representing 6.2 equivalents 
of the chromosome 3B, whose molecular size 
is close to one gigabase. Further improvements 
in the procedure permitted the construction of 
BAC libraries with chromosome coverage up to 
18 x and average insert size exceeding 120 Kb 
(https://olomouc.ueb.cas.cz/en/resources/ 
dna-libraries (Šafář etal. 2010; Table 3.1 and 
references therein). The effort toward prepar- 
ing the full set of CS libraries for the chromo- 
somal physical maps lasted over ten years and 
was completed in the end of 2013 (Fig. 3.1). 
Individual clones and BAC libraries used to 
construct chromosome-specific physical maps 
are publicly available and can be obtained at 
https://cnrgv.toulouse.inrae.fr/en/Library/Wheat. 
Besides the ‘CHINESE SPRING’ BAC librar- 
ies generated for the reference genome project, 


several customized chromosomal libraries from 
other cultivars were created for the purpose of 
gene cloning projects, including 3B-specific 
library from cv. HOPE (Mago et al. 2014) and a 
BAC library from 4AL arm of cv. TÄHTI, bear- 
ing an introgressed segment of Triticum mili- 
tinae (Janáková et al. 2019) (Table 3.1). 

Upon their construction, the CS libraries 
were distributed among national teams engaged 
in the IWGSC effort who embarked on con- 
structing physical maps. In a proof-of-concept 
experiment, Paux and co-workers (2008) gen- 
erated the first chromosomal physical map 
from chromosome 3B, employing SNaPShot- 
based High Information Content Fingerprinting 
(HICF) technology (Luo et al. 2003) to gener- 
ate fingerprints and FingerPrinted Contig (FPC) 
software to assemble the physical map and 
select minimal tiling path (MTP) for sequenc- 
ing. This achievement validated the feasibility 
of constructing sequence-ready physical maps 
of hexaploid wheat by the chromosome-by- 
chromosome approach and the strategy was 
subsequently followed for other chromosome 
arms (Table 3.1; IWGSC 2018). As alternative 
procedures, Whole Genome Profiling (WGP, 
van Oeveren etal. 2011) was applied for BAC 
fingerprinting in several projects and Linear 
Topological Contig (LTC, Frenkel etal. 2010) 
software was developed and utilized for map 
assembly and validation. Procedures applied for 
individual chromosomes/arms are summarized 
in IWGSC 2018. The resulting chromosomal 
physical maps are available at https://urgi.ver- 
sailles.inra.fr/download/iwgsc/Physical_maps/ 
and displayable at https://urgi.versailles.inra. 
fr/gb2/gbrowse/wheat_phys_pub/. In addition 
to the construction of physical maps for several 
chromosomes, the WGP technology was utilized 
to profile MTP clones identified from chromo- 
some physical maps constructed previously 
by the HICF procedure. Thus generated WGP 
tags of all 21 wheat chromosomes were used 
to support the assembly of the IWGSC RefSeq 
vl.0 genome and are available for download 
from IWGSC-BayerCropScience WGP™ tags 
https://urgi.versailles.inra.fr/download/iwgsc/ 
IWGSC_BayerCropScience_WGPTM_tags. 
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3.2.20 BAC Clone Sequencing 


Availability of BAC clones ordered in chro- 
mosomal physical maps opened avenue to sys- 
tematic analyses of bread wheat genome and 
its selected parts. The early studies, based on 
sequencing ends of BAC clones by Sanger tech- 
nology, provided first insights into gene and 
repeat content of particular chromosomes, ena- 
bled comparative analyses of homoeologous 
chromosomes and delivered information for 
targeted marker development (Paux et al. 2006; 
Sehgal et al. 2012; Lucas et al. 2012). 

Later studies, employing next-generation 
sequencing of whole BAC contigs, provided 
more comprehensive information about organi- 
zation of genes and transposable elements 
(TEs). Choulet etal. (2010) sequenced and 
annotated 13 BAC contigs, totaling 18 Mb 
sequence, selected from different regions of the 
3B chromosome and revealed that genes were 
present along the entire chromosome and clus- 
tered mainly into numerous small islands of 
3—4 genes separated by large blocks of repeti- 
tive elements. They observed that wheat genome 
expansion had occurred homogeneously along 
the chromosome through specific bursts of 
TEs. Bartoš etal. (2012), after sequencing 
a megabase-sized region from wheat arm 3DS 
and comparing it with the homoeologous region 
on wheat chromosome 3B, revealed similar rates 
of non-collinear gene insertion in wheat B and 
D subgenomes with a majority of gene duplica- 
tions occurring before their divergence. Li et al. 
(2013) provided valuable information about 
the structure of wheat centromeres. Analyzing 
1.1-Mb region from the centromere of chromo- 
some 3B, they revealed that 96% of the DNA 
consisted of TEs. The youngest elements, CRW 
and Quinta, were targeted by the centromere- 
specific histone H3 variant CENH3—the marker 
of the functional centromere. In contrast to the 
TEs, long arrays of satellite repeats found in 
the region were not associated with CENH3. 
Several other studies employing sequencing 
of BAC contigs focused on analysis of narrow 
regions comprising their genes of interest (Breen 
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etal. 2010; Mago etal. 2014; Janáková et al. 
2019; Tulpová et al. 2019b). 

Although these studies markedly advanced 
the knowledge on bread wheat genome, the 
major breakthrough came only with the genera- 
tion of chromosome-scale sequence assemblies. 
Choulet and co-workers (2014b) produced a 
BAC-based reference sequence of the largest 
bread wheat chromosome—3B. After sequenc- 
ing 8452 BAC clones, representing the 3B MTP, 
the authors assembled a sequence of 833 Mb 
split in 2808 scaffolds, 1358 of which, con- 
taining 774 Mb sequence, had known position 
on the chromosome. The assembly comprised 
5326 protein-coding genes, 1938 pseudogenes 
and 85% of transposable elements. Most inter- 
estingly, the distribution of structural and func- 
tional features along the chromosome revealed 
partitioning correlated with meiotic recom- 
bination. Comparative analyses with other 
grass genomes indicated high wheat-specific 
inter- and intrachromosomal gene duplication 
activities that were postulated to be sources of 
variability for adaption. As a contribution to 
the IWGSC sequencing effort, sequence assem- 
blies of BAC clones representing complete or 
partial MTPs of seven chromosomes and two 
chromosome arms were produced (Table 3.1 
and references therein; IWGSC 2018) and are 
publicly available at https://urgi.versailles.inrae. 
fr/download/iwgsc/BAC_Assemblies/. These 
assemblies, complemented by information from 
chromosomal physical maps, and—for group 7 
chromosomes—also chromosomal optical maps, 
were applied to support the assembly of the 
bread wheat reference genome, IWGSC RefSeq 
v1.0 (IWGSC 2018), as described in Chap. 2. 

It is clear nowadays that the whole-genome- 
shotgun became the predominant approach to 
sequencing, even for large polyploid genomes. 
Still, the generated wheat chromosomal physical 
maps and BAC clones integrated therein remain 
a valuable genomic resource for bread wheat, 
enabling a fast access to and a detailed analysis 
of a region of interest. The availability of BAC 
clones with a known genomic position facili- 
tated a focused and affordable resequencing of 
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a region of interest with long-read technologies, 
revealing discrepancies and missing segments in 
the previously generated bread wheat assemblies 
(Kapustová et al. 2019; Tulpová et al. 2019b). 


3.3 Chromosome Survey 


Sequencing 


While the generation of the full set of chromo- 
somal libraries, physical maps and BAC clone 
sequences proved to be a long-distance run, 
the requirement for homoeolog-resolved wheat 
genome information was increasing over time. 
Apparently, this demand could be met by low- 
pass chromosome sequencing, which would pro- 
vide approximate information about the genic 
component of individual chromosomes. The 
separation of each bread wheat chromosome 
or chromosome arm was, in principle, feasible 
but the yield of flow-sorted chromosomes, typi- 
cally 1-2 x 10? per sorting day, did not meet the 
demands of the early sequencing technologies 
on the DNA input, which was in the microgram 
range. Coupling of chromosome flow sort- 
ing with multiple-displacement amplification 
(MDA) of the chromosomal DNA, originally 
developed for physical mapping on DNA micro- 
arrays (Šimková et al. 2008), opened the door 
to shotgun sequencing of cereal chromosomes 
one-by-one. Wheat genome researchers adopted 
the strategy of chromosome survey sequenc- 
ing (CSS) developed for barley (Mayer et al. 
2009, 2011). In barley, low-coverage (1-3x) 
chromosomal data, obtained by 454 sequenc- 
ing, were compared with reference genomes of 
rice, sorghum and Brachypodium, and EST or 
full-length-cDNA datasets, which led to the esti- 
mation of gene content for each of the barley 
chromosomes. Moreover, an integration of the 
shotgun sequence information with the collinear 
gene order of orthologous rice, sorghum and 
Brachypodium genes allowed proposing virtual 
gene order maps of individual chromosomes. 
The syntenic integration, known as genome zip- 
per, resolved gene order in regions with limited 
genetic resolution, such as genetic centromeres, 
which were intractable to genetic mapping. 


The first experiments with the CSS in bread 
wheat were done to compare chromosome arms 
of homoeologous group 1 (Wicker et al. 2011), 
and it methodologically followed the barley 
model, employing the low-pass 454 sequenc- 
ing. The study revealed that all three wheat 
subgenomes had similar sets of genes that were 
syntenic with the model grass genomes but the 
number of genic sequences in non-syntenic posi- 
tions outnumbered that of the syntenic ones. 
Further analysis indicated that a large propor- 
tion of the genes that were found in only one 
of the three homoeologous wheat chromosomes 
were most probably pseudogenes resulting from 
transposon activity and double-strand break 
repair. These findings were supported by a study 
of Akhunov etal. (2013) who, working with 
CSSs of both arms of chromosome 3A, found 
that~35% of genes had experienced structural 
rearrangements leading to a variety of mis-sense 
and non-sense mutations—a finding concord- 
ant with other studies indicating ongoing pseu- 
dogenization of the bread wheat genome. 
Another focus of the CSS studies was the evo- 
lutionary rearrangement of wheat chromosomes. 
Hernandez etal. (2012) analyzed bread wheat 
chromosome 4A, which has undergone a major 
series of evolutionary rearrangements. Using 
the genome zipper approach, the authors pro- 
duced an ordered gene map of chromosome 4A, 
embracing ~85% of its total gene content, which 
enabled precise localization of the various trans- 
location and inversion breakpoints on chromo- 
some 4A that differentiate it from its progenitor 
chromosome in the A-subgenome diploid donor. 

In contrast to the above studies, Berkman and 
co-workers, aiming to shotgun sequence wheat 
7DS arm, favored the use of the more cost- 
efficient Illumina technology and compensated 
its short reads (75-100 bp) by higher sequenc- 
ing coverage (34x), which allowed a partial 
assembly of the reads and capture of ~40% of 
the sequence content of the chromosome arm 
(Berkman etal. 2011). Using the same tech- 
nology, the team proceeded with sequencing 
the 7BS arm (Berkman etal. 2012) and sup- 
plemented the 4A study by delimiting the 7BS 
segment that was involved in the reciprocal 
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translocation that gave rise to the modern 4A 
chromosome. After extending the sequenc- 
ing effort to all group7 homoeologs (Berkman 
et al. 2013), the team compared the sequences 
and concluded that there had been more gene 
loss in 7A and 7B than in 7D chromosome. 
Chromosome survey sequences of additional 
chromosomes/arms followed and were mostly 
utilized in estimating gene and repeat con- 
tent of particular chromosomes (Vitulo et al. 
2011; Tanaka et al. 2014; Sergeeva et al. 2014; 
Helguera et al. 2015; Garbus et al. 2015; Kaur 
etal. 2019), synteny-based ordering of arising 
clone-based physical maps (Lucas et al. 2013), 
identifying miRNA-coding sequences (Vitulo 
etal. 2011; Kantar etal. 2012; Deng etal. 
2014; Tanaka etal. 2014) and delimiting lin- 
age-specific translocations (Lucas et al. 2014). 
Utilization of the chromosome sequencing for 
gene mapping and cloning is described further 
in Sect. 3.5.1. 

The chromosome survey sequencing in 
bread wheat has been crowned by a joint effort 
coordinated by the IWGSC, which exploited 
the existing Illumina-based CSSs and comple- 
mented them by newly produced Illumina data 
for the remaining chromosomes. The sequences 
were applied to generate draft assemblies and 
genome zippers for all wheat chromosomes 
(IWGSC 2014). As a result, a total of 124,201 
gene loci were annotated and more than 75,000 
genes were positioned along chromosomes. The 
IWGSC team anchored more than 3.6 million 
marker loci to chromosome sequences, uncov- 
ered the molecular organization of the three 
subgenomes and described patterns in gene 
expression across the subgenomes. The study 
also provided new insights into the phylogeny of 
hexaploid bread wheat, which was elaborated in 
detail in an accompanying study of Marcussen 
et al. (2014). Moreover, this new wheat genome 
information was used as a reference to analyze 
the cell type-specific expression of homoe- 
ologous genes in the developing wheat grain 
(Pfeifer et al. 2014). 
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The technique of chromosome survey 
sequencing soon expanded beyond the cultivated 
crop and was successfully applied to explore 
individual chromosomes or whole genomes of 
close wheat relatives, such as Aegilops tauschii 
(Akpinar et al. 2015a) and Triticum dicoccoides 
(Akpinar etal. 2015c; 2018), and even spe- 
cies from the tertiary gene pool, including Ae. 
geniculata (Tiwari et al. 2015), H. villosa (Xiao 
et al. 2017), Ae. comosa, Aegilops umbellulata 
(Said et al. 2021) and A. cristatum (Zwyrtkova 
etal. 2022). These studies informed about the 
chromosome gene content and organization, 
enabling comparative studies important for gene 
transfer from the wild species to the crop as well 
as identifying the sequences enabling marker 
development for tracing introgressions in wheat. 
Specific examples are provided in Sect. 3.5.1 
and Table 3.2. 


3.4 Optical Mapping 
Extensive experience with preparing quality 
HMW DNA from flow-sorted chromosomes 
paved the way to establish a new branch of 
wheat chromosomal genomics—chromosome 
optical mapping (OM). The OM technology, 
commercialized by Bionano Genomics and 
therefore also known as Bionano genome map- 
ping, is a physical mapping technique based on 
labeling and imaging short sequence motives 
along 150kb to 1 Mb long DNA molecules 
(Lam etal. 2012). Resulting restriction maps, 
assembled from high-coverage single-molecule 
data, are composed of contigs up to> 100 Mb in 
size, which are instrumental in finishing steps 
of genome assemblies by enabling contig scaf- 
folding, gap sizing and assembly validation. The 
optical maps also provided a high-resolution and 
cost-effective tool for comparative structural 
genomics. 

Staňková et al. (2016) demonstrated the fea- 
sibility of generating optical maps from DNA 
of flow-sorted chromosomes and constructed 
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Table3.2 Leveraging wheat chromosomal resources in gene mapping and cloning 


Phenotype 
Stem rust resistance 


Green bug resistance 


Powdery mildew 
resistance 


Species cytoplasm 
specific 
Leaf rust resistance 


Glume blotch resistance 


Stripe rust resistance 


Russian wheat aphid 
resistance 


Pre-harvest sprouting 
resistance 


Semi-dwarfism 
Yellow Early Senescence 


Fusarium head blight 
resistance 


Locus 
Sr2 


SuSr-D1 
Gb3 
QPm-tut-4A 


Pm2 


Pm21 
Pm4 


Pmla 
SCS 


Lrl4a 

Lr57 

iro? 

Lr49 

Lr76 

Lrl4a 
OSng.sfr-3BS 


Yr40 
YrAW1 
Yr70 
Dn2401 


Phs-Al 
Rht18 


YES-1 
Fhb 


Sorted chrom./arm Applied approach 


3B 


7D 
7DL 
4AL 


5D 


6V 
2A 


TA 
1D 


7BL 

SME 

2D 

4B 
5D/5U 
7BL.5BL 
3B 


5M? 
4AL 
5D/5U 
7DS 


4AL 


6A 
3A 
VEI 


BAC BAC-based physical map/BAC sequencing 
CSS Chromosome survey sequence 
ChromSeq chromosome sequencing 
MutChromSeq Mutant chromosome sequencing 


OM Optical map 


RICh Rearrangement identification and characterization 


SynSNP Synteny-based SNP marker development 
TACCA TArgeted chromosome-based cloning via long-range assembly 


BAC 


MutChromSeq 
BAC 
SynSNP 


CSS, ChromSeq, RICh 
BAC, CSS 


MutChromSeq 


TACCA 
MutChromSeq 


ChromSeq 
SynSNP 


SynSNP 
ChromSeq 
TACCA 
ChromSeq 
ChromSeq 
MutChromSeq 
ChromSeq 


ChromSeq 
ChromSeq 
ChromSeq 


CSS, SynSNP 
BAC, OM 


BAC 


MutChromSeq 
ChromSeq 
ChromSeq 
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the first-ever optical map for the bread wheat 
genome. Using 1.6 million flow-sorted 7DS 
chromosome arms and the first-generation plat- 
form of Bionano Genomics, the authors pre- 
pared a map consisting of 371 contigs with N50 
of 1.3 Mb, which supported a physical-map and 
a BAC-based sequence assembly of the chro- 
mosome arm (Tulpová etal. 2019a). Applied 
in a gene cloning project, the OM posed a tar- 
geted tool for sequence validation and analysis 
of structural variability in a region of interest 
(Tulpová et al. 2019b). Similar maps have been 
constructed for other group-7 chromosome arms 
and were used in the process of assembling the 
wheat reference genome (IWGSC 2018), as well 
as a complementary BAC-based assembly of 
chromosome 7A (Keeble-Gagnére et al. 2018). 

Another set of chromosomal optical maps 
was prepared from chromosome arms IAS, 
IBS, 6BS and 5DS, the last being generated 
on the second-generation platform of Bionano 
Genomics, with the aim to position and char- 
acterize 45S rDNA loci located on those arms. 
The chromosome-based approach applied in 
the rDNA project enabled analyzing the loci 
one-by-one and provided more comprehensive 
information about individual loci than achieved 
in long-read bread wheat assemblies (Tulpova 
et al. 2022). 


3.5 Gene Mapping and Cloning 

In parallel with the chromosome sequencing 
efforts, the wheat community started exploiting 
flow-sorted chromosomes for targeted marker 
development, aiming to generate a high-density 
map in a region of interest and, possibly, clone 
a gene by a map-based approach. This conven- 
tional strategy was later complemented by new 
methods of ‘rapid gene cloning’ (reviewed in 
Bettgenhaeuser and Krattinger, 2019). Some of 
these still capitalize on the complexity reduction 
by chromosome flow sorting but they avoid the 
lengthy step of marker development and map 
saturation while employing mutation genetics 
and comprehensive sequencing techniques to 
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assemble a highly contiguous sequence for the 
chromosome of interest. 


3.5.1 Marker Development and Map- 


Based Gene Cloning 


The first effort toward massive marker devel- 
opment from a selected chromosome or chro- 
mosome arm was bound with the microarray 
platform of Diversity Array Technologies, 
able to identify and utilize polymorphic DNA 
markers without knowledge of the underly- 
ing sequence (Jaccoud etal. 2001). Wenzl 
et al. (2010) demonstrated that a chromosome- 
enriched DArT array could be developed from 
only a few nanograms of chromosomal DNA. 
Of 711 polymorphic markers derived from non- 
amplified DNA of bread wheat chromosome 
3B, 553 (78%) mapped to the chromosome, and 
even higher efficiency (87%) was observed for 
the short arm of bread wheat chromosome 1B 
(IBS). 

Before the availability of wheat chromo- 
somal survey sequences, researchers aiming to 
develop new markers for their locus of interest 
mined data from sequenced genomes of model 
grasses, mainly rice, Brachypodium and sor- 
ghum. Efficiency of this synteny-based approach 
was compromised by limitations in designing 
gene-derived primers with sufficient specificity 
to distinguish homoeologous genes in polyploid 
wheat. Amplified DNA from individual wheat 
chromosome arms used as a template for locus- 
specific PCR and subsequent amplicon sequenc- 
ing, significantly increased the efficiency of the 
procedure and the facilitated targeted generation 
of gene-associated SNP markers in a time- and 
cost-effective manner (Jakobson etal. 2012; 
Michalak de Jimenez etal. 2013; Terracciano 
et al. 2013; Stañková et al. 2015). Additionally, 
particular chromosomal arms used as a PCR 
template were applied to validate specificity 
of the newly designed markers (Staňková et al. 
2015; Janáková et al. 2019). 

Advancement in marker development came 
along with the release of ‘CHINESE SPRING’ 
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CSSs and genome zippers that informed about 
putative gene content and order in the region of 
Interest in the reference genome. Nevertheless, 
studies comparing shotgun sequences of CS 
chromosomes with those of other wheat acces- 
sions revealed extensive intra- and interchro- 
mosomal rearrangements in CS (Ma et al. 2014, 
2015; Liu et al. 2016), implying limitations in 
the transferability of data from the wheat refer- 
ence to other genomes. Moreover, it became 
obvious that agronomically important traits 
were frequently controlled by rare, genotype- 
specific alleles or had even been introgressed to 
wheat from its relatives. Under such scenario, 
genetic maps had to be created from a map- 
ping population derived from a donor of the 
trait and sequence information from the donor 
was essential for marker development. As a 
proof-of-concept experiment, Shatalina et al. 
(2013) generated tenfold coverage of Illumina 
data from chromosome 3B isolated from wheat 
cultivars ARINA and FORNO—the parents of 
their mapping population. Relying on a synteny 
with the Brachypodium genome, they identi- 
fied sequences close to coding regions and used 
them to develop 70 SNP markers, which were 
found dispersed over the entire 3B chromo- 
some and contributed to fourfold increase in the 
number of available markers. The new mark- 
ers were utilized for mapping a QTL confer- 
ring resistance to Stagonospora nodorum glume 
blotch located on 3BS (Shatalina et al. 2014). 
Chromosome sequencing was then applied 
by other groups to fine-map Yellow Early 
Senescence 1 (Harrington et al. 2019), leaf rust 
resistance gene Lr49 (Nsabiyera et al. 2020) and 
powdery mildew resistance gene Pml1 (Hewitt 
et al. 2021). 

The procedure was also adopted to develop 
markers in species from wheat tertiary gene 
pool, such as Ae. geniculata (Tiwari et al. 
2014) and H. villosa (Wang et al. 2017; Zhang 
etal. 2021), with the aim to trace the alien 
chromatin in the wheat background. For this 
purpose, the method was refined by Abrouk 
et al. (2017) who developed an in silico pipe- 
line termed Rearrangement Identification and 
Characterization (RICh). To delimit a segment 


transferred from T. militinae to the long arm of 
chromosome 4A of bread wheat cv. TAHTI, 
the authors generated a virtual gene order of 
‘TÄHTI chromosome 4A. Comparison of 
homoeologous gene density between 4AL arm 
of CS and the arm with the introgression, which 
harbored powdery mildew resistance locus 
QPm.tut-4A, identified alien chromatin with 
169 putative genes originating from T. mili- 
tinae. A similar approach was used by Bansal 
et al. (2020) to fine-map leaf rust and stripe rust 
resistance genes Lr76 and Yr70 introduced from 
Ae. umbellulata. The authors sequenced flow- 
sorted chromosomes 5U from Ae. umbellulata, 
5D from a bread wheat-Ae. umbellulata intro- 
gression line and 5D from the recurrent parent. 
Sequencing reads were explored with the aim 
to identify introgression-specific SNP markers 
whose projection on the IWGSC RefSeq v1.0 
sequence (IWGSC 2018) delimited the intro- 
gression to a 9.47 Mb region, in which candi- 
dates for Lr76 and Yr70 genes were identified. 
Konkin et al. (2022), streaming to identify genes 
for resistances to several fungal pathogenes, 
including fusarium head blight, sequenced 7EL 
telosome, originated from Thinopyrum elonga- 
tum and existing as addition in CS wheat. They 
thus built a reference for comparative transcrip- 
tome analysis between CS and CS-7EL addition 
line, which resulted in a list of candidate genes 
for the resistance. 

Alongside the wheat chromosomal survey 
sequences, emerging BAC assemblies from indi- 
vidual chromosomes of ‘CHINESE SPRING,’ 
just as customized chromosomal BAC librar- 
ies from other cultivars showed instrumental 
in gene cloning projects. Simková et al. (2011) 
demonstrated that BAC libraries constructed 
from chromosome arms 7DS and 7DL, con- 
sisting of tens of thousands BAC clones, were 
highly representative and easy to screen, which 
facilitated fast chromosome walking in a region 
of green bug resistance gene Gb3 in 7DL. The 
7DS BAC library was screened for markers 
tightly linked to a Russian wheat aphid resist- 
ance locus Dn2401 (Staňková et al. 2015) and 
a BAC contig spanning the locus was identified 
in a 7DS physical map (Tulpová et al. 2019a). 
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BAC clones from 0.83 cM interval, delimited by 
Dn2401-flanking markers, were sequenced by 
combination of short Illumina and long nano- 
pore reads and the resulting sequence assem- 
bly, validated by optical mapping of the 7DS 
arm (Staňková et al. 2016), revealed six high- 
confidence genes. Comparison of 7DS-specific 
optical maps prepared from susceptible cv. 
CHINESE SPRING and resistant line CI2401 
revealed structural variation in proximity of 
Epoxide hydrolase 2, which gave support to 
the gene as the most likely Dn2401 candidate 
(Tulpová et al. 2019b). Similarly, a BAC library 
and physical map of CS 4A chromosome were 
used to approach and analyse pre-harvest sprout- 
ing resistance locus Phs-Al, which revealed a 
causal role of TaMKK3-A for the trait (Shorinola 
etal. 2017). Customized BAC libraries con- 
structed from 3B chromosome of cv. HOPE 
and 4AL telosome bearing introgressed seg- 
ment of T. militinae were utilized to clone stem 
rust resistance gene $r2 (Mago et al. 2014) and 
to approach powdery mildew resistance locus 
Qpm.tut-4A (Janáková et al. 2019), respectively. 


3.5.2 Contemporary Approaches 


The completion and release of the ‘CHINESE 
SPRING’ reference genome (IWGSC 2018) in 
hand with rapid technological advancements, 
allowing resequencing and large-scale pan- 
genome projects even in a crop with a complex 
polyploid genome, revolutionized strategies of 
gene cloning in bread wheat. Whole-genome 
long-read sequencing, resulting in high-qual- 
ity sequence with resolved gene duplications, 
became realistic for wheat but challenges of 
producing, handling and analyzing the big data 
still appear too high for the majority of wheat 
gene cloning projects. Apart from the WGS 
and pan-genome efforts, several approaches 
to rapid gene cloning have been developed 
(Bettgenhaeuser and Krattinger 2019, and Chap. 
10 of this book), including several utilizing the 
complexity reduction by chromosome flow 
sorting. Among them, Mutant Chromosome 
Sequencing (MutChromSeq;  Sánchez-Martín 
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etal. 2016) and TArgeted Chromosome-based 
Cloning via long-range Assembly (TACCA; 
Thind et al. 2017) have been used most widely. 
As indicated by the acronym, the former method 
couples chromosome flow sorting and sequenc- 
ing with reference-free forward genetics. A 
chromosome bearing the gene of interest is 
Illumina-sequenced from both wild type and sev- 
eral independent ethyl methanesulfonate (EMS) 
mutants and the sequences are compared. A can- 
didate gene is identified based on overlapping 
mutations in a genic region. The feasibility and 
efficiency of the method were first demonstrated 
by re-cloning barley Eceriferum-q gene and by 
de novo cloning wheat powdery mildew resist- 
ance gene Pm2 (Sánchez-Martín et al. 2016). 
This speedy, cost-efficient approach to gene 
cloning generated a lot of interest in both wheat 
and barley community (reviewed in Steuernagel 
et al. 2017). It was successfully applied to iden- 
tify the semi-dwarfism locus Rht18 in T. durum 
(Ford et al. 2018) and the SuSr-D/ gene that sup- 
presses resistance to stem rust in bread wheat 
(Hiebert et al. 2020). Moreover, it contributed to 
cloning the race-specific leaf rust resistance gene 
Lrl4a (Kolodziej et al. 2021) and the powdery 
mildew resistance gene Pm4 (Sánchez-Martín 
et al. 2021) from hexaploid wheat. 
MutChromSeq is a method of choice for 
traits with a strong phenotype, for which the 
production of independent mutants is feasi- 
ble. As an alternative, suitable for any pheno- 
type, Thind et al. (2017) proposed a procedure 
based on producing a high-quality de novo 
assembly of the gene-bearing chromosome and 
named it TACCA. The procedure utilized the 
so-called Chicago mapping technique (Putnam 
etal. 2016) developed by Dovetail Genomics. 
To clone leaf rust resistance gene Lr22a, the 
authors flow-sorted and Illumina-sequenced 
wheat chromosome 2D from resistant line CH 
CAMPALA Lr22a. The resulting sequences 
were scaffolded with Chicago long-range link- 
age. The assembly comprised 10,344 scaffolds 
with an N50 of 9.76 Mb and with the longest 
scaffold of 36.4 Mb. The high contiguity of the 
chromosomal assembly significantly reduced 
the number of markers needed to delimit the 
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gene in a narrow interval and, complemented by 
information from EMS mutants, allowed rapid 
cloning of this broad-spectrum resistance gene. 
The TACCA approach was also applied by Xing 
et al. (2018) to clone powdery mildew resistance 
gene Pm21, introduced to bread wheat from H. 
villosa chromosome 6V. Besides, the quality 
chromosomal assemblies generated by long- 
range linkage were used for comparative analy- 
ses with chromosomes of the wheat reference 
genome (Thind et al. 2018; Xing et al. 2021). 


3.6 Conclusions and Perspectives 
Since its establishment in 2000, flow-cytomet- 
ric chromosome sorting contributed to major 
achievements in bread wheat genomics, includ- 
ing the generation of the wheat reference genome. 
Due to the rapid advancements in next-generation 
sequencing technologies, the reduction of genome 
complexity is no more essential in the context of 
whole-genome sequencing, but remains beneficial 
in gene cloning projects that call for a high-qual- 
ity sequence from a narrow region of the genome. 
This demand was met in coupling chromosome 
sorting with the long-range linkage method, 
which resulted in contiguous chromosome assem- 
blies. Since Dovetail Genomics discontinued 
the Chicago method, other approaches need to 
be developed to satisfy the demand of the wheat 
community. Long-read sequencing technologies, 
such as PacBio or nanopore sequencing, appear to 
be the logical tools for achieving the goal but to 
make them compatible with the flow-sorted mate- 
rial, challenges relating to inherent features of the 
flow sorting technique—formaldehyde fixation 
and a high laboriousness of producing large DNA 
amounts—still need to be resolved. Low-input 
protocols, being developed by the sequencing 
companies, go toward this demand. 
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Wheat genome sequencing has passed through 
major steps in a decade, starting from the 
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obtained from chromosome-specific BAC 
reach high-quality genome 


libraries, to 


F. Choulet (24) - H. Rimbert - P. Leroy - 

P. Lasserre-Zuber - N. Papon 

UCA, INRAE, GDEC, Clermont-Ferrand, France 
e-mail: frederic.choulet  inrae.fr 


H. Rimbert 
e-mail: helene.rimbert @inrae.fr 


P. Leroy 
e-mail: philippe.leroy.2 G inrae.fr 


P. Lasserre-Zuber 
e-mail: pauline.lasserre-zuber @inrae.fr 


N. Papon 
e-mail: nathan.papon @inrae.fr 


X. Wang 

BASF Belgium Coordination Center Commv, Trait 
Research, Gent Zwijnaarde, Belgium 

e-mail: xi.wang @basf.com 


M. Spannagl 

PGSB Plant Genome and Systems Biology, 
Helmholtz Zentrum Miinchen, German Research 
Center for Environmental Health, Neuherberg, 
Germany 

e-mail: manuel.spannagl @helmholtz-muenchen.de 


D. Swarbreck 

Earlham Institute, Norwich Research Park, Norwich, 
Norfolk, UK 

e-mail: david.swarbreck @ earlham.ac.uk 


@ The Author(s) 2024 


assemblies of a dozen of bread wheat varieties 
and wild relatives. While access to an assem- 
bled genome sequence is crucial for research, 
the resource that is mainly used by the com- 
munity is not the sequence itself, but rather the 
annotated features, i.e., genes and transposable 
elements. In this chapter, we describe the work 
performed to predict the repertoire of 107k 
high-confidence genes and 4 million TE cop- 
les in the hexaploid wheat genome (cultivar 
CHINESE SPRING; IWGSC RefSeq) and the 
procedures established to transfer the annotation 
through the different releases of genome assem- 
bly. Limitations and implications for building a 
wheat pangenome are discussed, as well as the 
possibilities for future improvements of struc- 
tural annotation, and opportunities offered by 
novel approaches for functional annotation. 


Keywords 
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4.1 Introduction 


The International Wheat Genome Sequencing 
Consortium 
genome.org) was launched in 2005 with the aim 
of accelerating research in wheat by delivering 
molecular markers and genomic resources with 


(IWGSC; http://www.wheat- 
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the long-term goal of getting a high-quality 
reference genome sequence for the hexaploid 
wheat (Feuillet and Eversole 2007). It represents 
more than a decade of coordinated efforts from 
the completion of the first chromosome-specific 
BAC library construction (Paux et al. 2008) to 
the assembly of the 21 chromosome sequences 
of cultivar CHINESE SPRING (IWGSC 2018). 
Since the first release in 2018, the IWGSC inte- 
grated additional information coming from opti- 
cal mapping and long reads in order to improve 
the quality of the assembly by correcting mis- 
ordered scaffolds and filling gaps. This led to 
release RefSeq v2.0 and v2.1 in 2021 (Zhu et al. 
2021). 

Besides the methodological challenge of 
assembling this genome, the work performed 
to deliver an annotation is not well known and 
often poorly considered. Annotation consists of 
the identification of sequence features providing 
biological information, and it represents one of 
the most difficult tasks in genome sequencing 
projects. It is far from being obvious. However, 
annotation is the data mostly accessed by users, 
contrary to the genome sequence. Achieving 
a robust structural and functional genome 
sequence annotation is, thus, essential to pro- 
vide the foundation for further relevant biologi- 
cal studies (Yandell and Ence 2012). Annotation 
of the RefSeq v1.0 required the coordinated 
effort of the IWGSC Annotation Group, bring- 
ing together researchers from three different 
Institutes: GDEC (France), PGSB (Germany), 
and Earlham Institute (UK). In addition, after 
the first release of the annotation, additional 
work has been performed in order to incorpo- 
rate manual curation, and especially to update 
the annotation following changes to the genome 
assembly. This was achieved by developing fine- 
tuned bioinformatics approaches. 

In this chapter, we present an overview of 
the processes that were established in order 
to release the first version of the annotation of 
RefSeq v1.0 and the updates since the first ver- 
sion. Besides the description of the work per- 
formed, this chapter is also a current opinion to 
consider the degree of approximation, the limits 
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of the resources available and used for down- 
stream analyses, and thus, a critical view of the 
quality of the data. The chapter also includes 
the plans for future versions not only for the 
structural annotation, but also for functional 
annotation. 


4.2 Methods, Strategies, Resources 
for Structural Annotation 

of Genomes and Their 
Implications in Wheat 
Pangenomics 

4.2.1 General Aspects 


of Structural Annotation 


Depending on the sequence features targeted for 
study, and depending on the organism, genome 
annotation can be either trivial or complicated. 
This is why there may be a confusion for non- 
experts who may believe annotation is routine 
in genome sequencing projects. This is not the 
case for many species, and especially, this was 
not the case for wheat. For instance, in compact 
bacterial genomes, coding genes are intron- 
less and represent the very wide majority of 
the genome so that predicting the presence of 
coding openreading frames is obvious and 
does not even require human curation. For spe- 
cies already widely studied, like in human for 
instance, with several genomes already assem- 
bled and annotated, annotation may be routine 
since it is based purely on similarity with avail- 
able highly conserved genomes. The difficulty 
of annotation increases with the size of the 
genome, the repeat content and active transpos- 
able element (TE) expression, the ploidy, the 
fragmentation of coding genes into small exons, 
and with the phylogenetic distance to an already 
well-characterized genome. The difficulty also 
increases with the level of conservation of the 
predicted features. A protein-coding gene highly 
conserved among distant species will be easily 
predicted with high confidence, while predicting 
poorly conserved features with a high level of 
accuracy is more complicated. 
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Annotation relies on the combination of 
approaches: (i) the homology-based method 
using alignment/mapping algorithms search- 
ing for sequence similarity either with proteins, 
showing that a sequence is conserved across 
evolution, and/or transcriptomic data, showing 
that a sequence is expressed; (ii) the ab initio 
methods, i.e., predictions using statistical mod- 
els such as hidden Markov models (HMMs); 
(iii) structural feature-based method through the 
identification of intrinsic information like motifs 
at the borders of transposons. It thus relies on 
a combination of software, algorithms, and 
adapted reference libraries. Annotation needs to 
be automated, i.e., performed through a pipeline 
that combines all different programs and mini- 
mizes the subsequent long and laborious step of 
manual curation. 


4.2.2 Sequence Features Usually 
Annotated and Common 
Ambiguities 


In the plant genomics area, publications usu- 
ally report on genes and repeats. Both terms 
are, however, confusing and the shortcut widely 
accepted by the community to distinguish 
genes and repeats is ambiguous. First, for con- 
venience, the term “gene” is used as a short- 
cut for protein-coding gene. It will be the case 
in this chapter too. When a “number of genes” 
is given, it nearly always refers to a number of 
protein-coding genes. However, genomes also 
carry non-coding RNA (ncRNA) genes which 
are biologically important. In the annotation 
area, we distinguish two types of non-coding 
RNA genes: (i) highly conserved ncRNAs 
involved in essential cellular processes (splic- 
ing, translation) which are ribosomal RNAs, 
transfer RNAs, small nuclear and nucleolar 
RNAs, and (ii) less evolutionary conserved 
ncRNAs like micro-RNAs, long-non-coding 
RNAs, and others involved in specific regula- 
tion processes. Annotating conserved and non- 
conserved. ncRNAs follows two completely 
different approaches. rRNA, tRNAs, snoRNAs, 


snRNAs are easily identified by a simple simi- 
larity-search approach; however, they tend not 
to be annotated. The reason for that is prob- 
ably that they are of interest only for research 
groups working specifically on them and that 
are able to identify them with specific tools. In 
contrast, annotation is much more complicated 
for the species-specific ncRNAs. It requires the 
availability of small RNASeq reads that could 
be mapped to identify transcribed regions as 
a first clue before concluding to the presence 
of an ncRNA gene. Second, genes are repeats. 
In bread wheat, the majority of the "genes" 
are repeated with only 17% (30,948/181,036) 
of single-copy genes (IWGSC 2018) so refer- 
ence to genes versus repeats brings confusion 
particularly when some repeats carry genes. 
"Repeats" is a general term encompassing sim- 
ple repeats as satellite DNA, telomeric repeated 
motifs, but also transposable elements (TEs), 
and their mobilizable or inactive derivatives. 
Usually in plant genome annotation, the term TE 
is used to describe all elements whatever their 
status, autonomous, non-autonomous, transpos- 
able, mobilizable, or inactive. TEs can carry 
genes and/or pseudogenes that encode proteins 
involved in transposition. In species like wheat, 
where the genome is massively comprised of 
TEs, it is essential to identify them to avoid call- 
ing genes that are in fact derived from TEs and, 
thus, are/were involved in transposition rather 
than a function related to a phenotype and under 
selection pressure. 

The problems described above limit our abil- 
ity to determine if a sequence is a functional 
protein-coding gene, a pseudogene, or part of 
a TE, with high confidence. In addition, the 
lack of evidence sometimes limits our ability 
to precisely determine the structure of a gene. 
Positions of the start codon and borders between 
coding exons and introns can remain doubtful in 
many cases. Transcriptomic data like RNASeq 
are extremely useful to determine exon/intron 
borders, the existence of alternative transcripts, 
and the extent of untranslated regions (UTRs of 
the mRNA upstream the start and downstream 
the stop codons). Fixing the start codon position, 
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however, often requires protein sequence homol- 
ogy. Usually in whole-genome annotation pro- 
jects, for each gene, the most important is to 
predict the coordinates of the CDS features (1.e., 
the coding exons). With RNASeq, it became a 
routine to also annotate the positions of UTRs 
and all alternatively spliced mRNAs, while 
defining one representative mRNA/CDS per 
gene (usually the longest or the most conserved 
with other species, numbered “1” by conven- 
tion). For low or non-expressed genes, UTR 
and mRNA coordinates may not be predicted 
because of a lack of information. In that case, 
the gene coordinates are limited to the CDS, 
which remains the basic essential annotation for 
a protein-coding gene. For wheat, our main goal 
was to predict CDS first and, if possible, to add 
the layer of UTRs and transcripts, these later 
ones being highly dependent on the RNASeq 
samples available and methods used. 

Wheat gene models have been assigned a 
confidence category, namely high versus low 
confidence (HC, LC). This could be misleading 
since confidence may rely either on the exist- 
ence of a gene or rather on its exon/intron struc- 
ture. For instance, one can be highly confident 
that a sequence encodes a gene while weakly 
confident on its exact exon coordinates. Both 
are related. Doubt of the existence of a gene at a 
given locus is associated with lack of homology 
evidence. In RefSeq vl, the HC/LC categories 
classified genes based on their level of similar- 
ity (complete or partial) with proteins from other 
plants. The consequence is that HC genes are 
likely functional and conserved among Poaceae 
even if some might be predicted with a doubtful 
structure. LC genes share partial similarity with 
known proteins and can be well-defined func- 
tional genes but the qualitative judgment is of 
low confidence. 

Refinement of automated annotation pipe- 
lines to deal with the LC "challenge" is expected 
to engage manual curation by experts. Manual 
curation is required to improve the overall 
quality of the automated annotation. However, 
manual curation may be mistakenly consid- 
ered as a validation. Both computer and human 
algorithms take a decision based on a priori 
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knowledge on the structure of genes and on 
homology information. When the decision is 
obvious, typically for genes widely conserved, 
homology with known proteins and mapped 
transcripts, if consistent, human curation is not 
needed. When homology is weak or partial, with 
a lack of transcription evidence, manual cura- 
tion does not allow to achieve high confidence 
neither on the existence of a gene nor on its 
structure. Curation has a positive impact only in 
particular cases: missing genes (with evidence 
slightly under default thresholds), chimeric tan- 
dem duplicated genes, start codon mis-assign- 
ment, and correction of gene models that are 
in fact pseudogenes because truncated or with 
frameshift mutations. These are all particular 
cases where the situation deviates from standard 
and is too complex for algorithms. 

For TEs, especially in large genomes, manual 
curation has a much stronger impact than it has 
for genes. Automated TE modeling is extremely 
complicated in genomes like wheat where TEs 
cover 85% of the genome. The history of nested 
insertions of young elements into old ones has 
shaped a mosaic of TEs highly fragmented. For 
instance, manual curation led to identify blocks 
of nested TEs in which the two extremities of 
the older element are separated by» 200 kb 
(Choulet et al. 2010). Such reconstruction is a 
computational challenge, and manual curation 
still has a major impact on the quality of the TE 
annotation. However, with around 4 million TEs 
in the wheat genome, manual curation was lim- 
ited to small regions for the moment. 


4.2.3 TEs Versus Genes: The Crucial 
Point of Having a Manually 
Curated TE Library 


Providing the complete (protein-coding) gene 
catalog of a sequenced genome is the prior- 
ity of annotation. The impact of our knowl- 
edge about TEs on our ability to determine if 
an ORF is part of a functional gene, or if it is 
a TE-related ORF, is illustrated in rice, where 
the first releases in 2002 over-predicted around 
50,000 genes (Goff et al. 2002; Yu et al. 2002; 
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Bennetzen et al. 2004) because of unknown TEs. 
In the wheat context, in the first release (RefSeq 
v1.1), the predicted CDSs represented 143 Mb 
[i.e., 107,891 HC genes; (IWGSC 2018)] which 
is not even 1% of the genome versus 85% for 
TEs. Considering the possibility that if even 
only 5% of the TEs are not correctly identified, 
the amount of "TE-related ORFs" considered 
as potential functional genes would exceed the 
total number of predicted genes. Consistent with 
such a high degree of uncertainty was the initial 
number of 908,149 candidate loci (after filter- 
ing out TE-matching loci) that matched either 
transcripts and/or homologous proteins in the 
wheat draft genome annotated in 2014 (IWGSC 
2014). RNASeq analysis highlighted 976,962 
potentially expressed loci in this study (gener- 
ating polyA-tailed transcripts), a number con- 
sidered to be well in excess compared to what 
was expected based on studies in model grasses. 
Releasing an annotation that is a good repre- 
sentation of the biological reality is therefore a 
challenge, and the availability of a curated TE 
library is of major importance since it could fil- 
ter out thousands of mis-called genes. 

In the development of a representative wheat 
genome sequence, the long-standing effort to 
build a high-quality curated TE library has 
provided a sound foundation. From the begin- 
ning of BAC sequencing in wheat, barley, and 
related Triticeae, which all share common TE 
families, several groups around the world have 
contributed to manually annotate TEs while 
defining their exact borders (by searching for 
terminal repeated motifs). These TEs were 
organized, classified, and distributed through the 
Triticeae Repeat (TREP) library maintained by 
Thomas Wicker at Zurich University, a resource 
extremely useful for masking TEs, à common 
task in genome annotation meaning that nucleo- 
tides assigned to TEs are converted to Ns (or to 
lowercases). In 2010, the first large contiguous 
wheat sequences (obtained from BAC-contigs) 
were published, representing 18 Mb (Choulet 
et al. 2010). Although it accounted only 0.1% 
of the genome, it doubled the amount of wheat 
sequences available at that time. Even though 
our knowledge of the wheat genome was still 


extremely partial, similarity-searches against 
TREP already identified 75% of the sequence as 
TEs. This early work demonstrated that manual 
annotation of a small fraction of the genome 
allowed the identification of all the abundant TE 
families, highly repeated, that comprised most 
of the genome. It also revealed that CACTAs 
were underrepresented in the library, contrary 
to LTR-Retrotransposons (LTR-RTs) Gypsy/ 
Copia. The main reason being that the level of 
variability/diversity of LTR-RTs is low com- 
pared to CACTAs. This impacts TE annota- 
tion/masking because similarity-search (at low 
stringency) allows  cross-matching between 
LTR-RT families, meaning that it is not neces- 
sary to have identified all families to mask the 
unknown ones. In contrast, for CACTA fami- 
lies, similarity between families is often lim- 
ited to the extremities of the element while the 
internal part is much more variable. This is why 
a special effort was made, in 2010, to manually 
curate 3222 elements, especially 330 CACTAs, 
in order to enrich the wheat TE library (Choulet 
et al. 2010). This led to the proportion of pre- 
dicted TEs increasing from 75 to 85% of the 
genome. In 2014, these ca. 3200 new elements 
were combined to TREP and classified de novo 
and a more exhaustive library called ClariTeRep 
was established (Daron et al. 2014). ClariTeRep 
is mostly enriched in CACTAs compared to the 
original TREP library and has a clear impact on 
TE annotation of Triticeae genomes. Several 
Triticeae sequencing projects concluded that 
CACTAs represent 5-696 of the genome (Jia 
et al. 2013; Ling et al. 2013), while their propor- 
tion is around 15% based on ClariTeRep. 


4.2.4 Ab Initio, Homology-Based 
Predictions, and the RNASeq 
Revolution for Gene Calling 
in Complex Genomes 


Pipelines for automated structural annotation 
usually require to combine information from 
abinitio predictors and evidence of similarity 
with known proteins in other species or tran- 
scriptome sequences (ESTs, full-length cDNAs, 
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RNASeq [short reads], IsoSeq [long reads] 
data). For large genomes like in wheat, the 
problem of ab initio predictors is the very high 
number of false positives. Indeed, since TEs are 
estimated to cover at least 85% of the genome, 
while genes would cover 1—2%, the remain- 
ing 13-14% of unannotated DNA account for 
approximately 2 Gb where gene finders pre- 
dict gene models because of the presence of 
ORFs that look likely coding. The reason is 
that the unannotated part is shaped by low-copy 
TE-derived sequences, old TE relics, not identi- 
fied with default TE identification approaches, 
that carry ORFs that are/were coding (e.g., frag- 
ment of transposase) and thus are mistakenly 
recognized by gene predictors. 

Because of the TE-derived ambiguity, bio- 
logical evidence of homology with related spe- 
cies has always been the criteria of choice to 
accurately predict genes in wheat. The bad 
point for wheat was that the number of related 
species with a sequenced genome was limited, 
among the Poaceae, to Oryza sativa, Zea mays, 
Sorghum bicolor, and Brachypodium distach- 
yon. Outside the Poaceae (common ancestor 
60 MYA), sequence similarity is too weak to 
ensure accurate homology-based predictions. 
This raised a serious problem: wheat genes con- 
served among the Poaceae were well-predicted 
but our ability to predict less conserved genes 
was very limited at the early stages of annota- 
tion before 2010, especially for species-specific 
genes. 

Transcriptome sequencing considerably 
enhanced our ability to determine which regions 
of the genome carry genes because it showed 
evidence of transcription. Transcriptome 
sequencing started with a massive effort to 
sequence millions of ESTs and full-length 
cDNAs (Ogihara etal. 2004; Zhang etal. 
2004) and was followed by the emergence of 
RNASeq technical capacity which provided 
unprecedented power to drive structural anno- 
tation. First use of an RNASeq expression atlas 
for wheat gene annotation at the chromosome 
scale was published in 2014 (Choulet et al. 
2014; Pingault et al. 2015). In brief, 7264 gene 
models were predicted but only 5185 (71%) 
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showed transcription evidence in an RNASeq 
atlas covering five plant organs at three devel- 
opmental stages each. In addition, 3692 tran- 
scribed regions were detected in the unannotated 
sequences showing that 42% of the loci likely 
expressed did not correspond to predicted pro- 
tein-coding genes. This indicated a high level 
of uncertainty in describing biological real- 
ity when annotating the wheat genome. In this 
chapter, we propose a critical view of auto- 
mated gene annotation pipelines, namely that 
bioinformatics can predict but not demonstrate 
that a sequence is a gene and that a gene is not 
a pseudogene. Although RNASeq became a pri- 
mary resource for structural annotation, the cor- 
respondence between RNASeq-read mapping 
loci and the final filtered gene set was far from 
perfect, with 29% of chr3B gene models show- 
ing no transcription evidence and 42% of tran- 
scribed regions not looking like protein-coding 
genes. Homology with related species remains 
an important benchmark. 


4.2.5 Single-Gene Duplications Raise 
More Problems Than Polyploidy 
for Structural Annotation 


Given the weight of similarity-search with tran- 
scripts and proteins in structural annotation, 
intrinsic features of the genome significantly 
impact the difficulty to identify the correct gene 
structure since sequence alignments underpin 
all the studies. A first important intrinsic fea- 
ture to impact annotation is the fragmentation 
level, i.e., the number of exons per gene. As a 
CDS is fragmented into several exons, the dif- 
ficulty to predict the correct intron/exon struc- 
ture increases. In wheat, considering RefSeq 
Annotation v2.1, the average number of exons 
per CDS is only 4. Sixty percent of the CDSs 
are split into a maximum of 3 exons. Actually, 
only 10% of the gene set corresponds to CDSs 
split into ten exons or more. Thus, the fragmen- 
tation problem is limited in wheat. 

Other important criteria are the lengths of 
exons and introns. Small exons might be missed 
by sequence alignments because under the 
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default thresholds of automated pipelines. Large 
introns also raise problems for spliced-align- 
ments. In the current wheat annotation release, 
the average exon length is 498 bps and the aver- 
age intron length is 280 bps (considering only 
one representative transcript per gene). Thus, 
exons are, on average, large enough for high- 
scoring alignments, and introns are small enough 
for the efficiency of spliced-alignments. So, 
although it is commented that the wheat genome 
is complex, some intrinsic features are rather less 
complex than in many other eukaryotes. 

Does polyploidy impact our ability to call 
genes? The main problem with alignment-based 
methods for gene calling is obviously multiple 
mapping, ie., the fact that a transcript/protein 
matches at multiple loci along the genome. But 
it does not mean at all that single-copy genes 
are easier to predict than duplicated genes. In 
contrast, the fact that a gene is repeated on, e.g., 
chromosomes 1A, 1B, and 1D, because of poly- 
ploidy is rather in favor of accurate structural 
annotation. Since each copy is carried by a dif- 
ferent chromosome, it is annotated independently 
and this does not generate problems due to mul- 
tiple mapping. The three subgenomes A-B-D 
could be annotated as if they were three genomes 
of three different species. If a gene copy is 
silenced and thus does not generate an RNASeq 
signal, reads coming from the copies that are 
transcribed can be used to predict the structure of 
all copies. So, again, to our opinion polyploidy 
is an advantage here for structural annotation. 
To go further, we can even consider that we did 
not fully exploit the advantage provided by this 
intrinsic redundancy of the genome for structural 
annotation of the IWGSC RefSeq. We will pre- 
sent this in more detail in the paragraph below 
describing future plans for improvements. 

Large chromosomes such as found in wheat 
are usually fragmented into "chunks" that are 
annotated independently in parallel. The prob- 
lems with multiple mapping arise when repeated 
copies of a gene are carried by the same chunk. 
This is typically the case for tandemly dupli- 
cated genes. This is why automated structural 
annotation of tandem duplicates is the most 


complicated task. Single-gene duplications are 
much more problematic than whole-genome 
duplication (i.e., polyploidy). This is true for 
every genome to be annotated mainly via the 
homology-based approach. However, for wheat, 
this problem has strong implications because 
we demonstrated that single-gene duplications 
intensively affected the gene repertoire dur- 
ing its recent evolution (Glover et al. 2015). In 
the IWGSC RefSeq v1.1, we found that 2796 
of genes were present as tandem duplicates 
(IWGSC 2018). Multiple mapping of homolo- 
gous proteins and transcripts on tandem dupli- 
cates may lead to artificially link exons from the 
two copies and, thus, to predict chimeric genes. 
This is especially the case for highly identi- 
cal copies that are separated by a small inter- 
genic region, compatible with a classical intron 
length. Some highly repeated gene families 
such as the kinase genes and disease resistance 
genes are well known to fall into this category. 
Unfortunately, these genes are often the favorite 
candidates to control phenotypes of interest, and 
in that case, manual curation is a required step to 
improve significantly the accuracy of automated 
annotation. 


4.3 RefSeq V1.0 Structural 
Annotation 
4.3.1 The Impact of Annotation 


Procedure on Gene 
Predictions Is Very Strong 


Sequencing the wheat genome has a long story. 
Different initiatives have been launched fol- 
lowing the advances of sequencing technolo- 
gies to tackle the hexaploid genome and also 
the genome of the diploid and tetraploid rela- 
tive species. For CHINESE SPRING itself, 
before completing RefSeq vl, a draft genome 
assembly (named CSSs for chromosome survey 
sequences) was released in 2014 (IWGSC 2014) 
together with a chromosome-scale assembly of 
the entire chromosome 3B using a BAC-by-BAC 
approach, hereafter named “3B-BAC-2014” 
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(Choulet et al. 2014). In addition, another ver- 
sion of the CHINESE SPRING genome was pro- 
duced and annotated in 2017 named TGACv1 
(Clavijo et al. 2017). Hence, when the anno- 
tation of RefSeq vl started, chromosome 3B 
has already been annotated three times inde- 
pendently: 3B-BAC-2014 with the TriAnnot 
pipeline at GDEC Institute (Clermont-Ferrand, 
France), CSS-3B-v22 at PGSB Institute 
(Munich, Germany), and TGACv1 at Earlham 
Institute (EI, Norwich, UK) with homemade 
pipelines. Here, we compared these three gene 
catalogs to have a flavor of the impact of the 
methods on the results released: among the 7264 
CDSs predicted on 3B-BAC-2014, only 2696 
(1884) and 1246 (867) were strictly identical in 
TGACv1 and CSSv2.2 (sharing strictly identi- 
cal protein sequences). These percentages appear 
extremely low if one considers these are three 
independent initiatives to sequence/annotate the 
same genotype. It demonstrates the impact of 
the annotation procedure on the released gene 
catalog as well as the possible impact of the 
sequencing strategy and assembly quality. 


4.3.2 Gene Annotation Through 
a Federated Approach 


Given the strong differences observed when 
comparing results obtained by different groups, 
the IWGSC established an Annotation Working 
Group in order to coordinate the efforts and 
establish an integrated approach to annotate 
RefSeq vl. Genes were predicted indepen- 
dently by two groups using two different pipe- 
lines and two different strategies: GDEC and 
PGSB. Both were then integrated at EI to end up 
with a single annotation. This led to v1.0 which 
was quickly updated into v1.1 after integrat- 
ing-4000 manually curated genes (see below 
for details on curation). 

In v1.1, 107,891 high-confidence (HC) pro- 
tein-coding loci were identified, with a rela- 
tively equal distribution across the A, B, and 
D subgenomes (35,345, 35,643, and 34,212, 
respectively). In addition, 161,537 other protein- 
coding genes were classified as low-confidence 
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(LC) genes, representing partially supported 
gene models, gene fragments, and orphans. On 
ChrUn (unplaced scaffolds), 2691 HC and 675 
LC gene models were identified. Evidence for 
transcription was found for 8596 (94,114) of the 
HC genes versus 4996 of the LC genes. In addi- 
tion, 303,818 pseudogenes were also annotated. 
The quality of RefSeq Annotation v1.1 was 
estimated with BUSCO v3 (24). It revealed that 
99% (1436/1440) of the BUSCO v3 genes were 
present in at least one complete copy and 9096 
(1292/1440) in three complete copies. 


4.3.2.1 Gene Modeling Using TriAnnot 
The TriAnnot pipeline was developed and 
updated over a period of more than 10 years to 
enable automated robust structural and func- 
tional annotation of protein-coding genes, trans- 
posable elements, and conserved non-coding 
RNA genes in Triticeae genomes (Leroy et al. 
2012). It was dedicated to large-scale annotation 
projects and is executable through the command 
line on high-performance computing infrastruc- 
tures for parallelization with task dependencies. 
TriAnnot was initially used for the annotation 
of BACs (Choulet et al. 2010) and then for the 
entire chromosome 3B (Choulet etal. 2014). 
Thus, it was intensively trained and custom- 
ized specifically for wheat before we assembled 
RefSeq v1. 

The specificities of the annotation strategy 
implemented in TriAnnot included: (i) mask TEs 
first in order to restrict the gene modeling to the 
non-TE space; (ii) use both evidence-based and 
abinitio approaches before selecting the best 
gene model at each locus. It was launched indi- 
vidually on each scaffold (or chunks for large 
ones) of RefSeq v1.0 in parallel while positions 
of features were subsequently calculated on 
pseudomolecules. The different steps and tools 
launched by the pipeline are described below: 


e Step 1: TE annotation and sequence mask- 
ing. TEs were identified by similarity-search 
using CLARITE and ClariTeRep (Daron 
etal. 2014). CLARITE used RepeatMasker 
with cross_match as search engine for opti- 
mized accuracy (Smit etal. 1996-2004). 
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Nucleotides assigned to TEs were then 
masked so that the following steps, i.e., 
ab initio predictions and similarity-searches, 
were all performed on the masked genome 
sequence. 

e Step 2: Gene modeling. Ab initio gene mod- 
els were predicted using two gene finders 
previously trained with a wheat gene data- 
set: FGeneSH (http://linux1.softberry.com/ 
berry.phtml) and AUGUSTUS (Stanke et al. 
2006). Evidence-driven gene predictions 
were also computed following three differ- 
ent strategies giving different weights to 
protein and transcript similarities. The first 
approach was based on homology with pro- 
teomes of related species. Similarity-search 
was performed using BLAST (Zhang et al. 
2000) and significant hits, filtered with fine- 
tuned thresholds, were then used for spliced- 
alignment using EXONERATE (Slater and 
Birney 2005). The query proteins were 
those predicted in main Poaceae species 
for which a genome sequence was avail- 
able: O. sativa (International Rice Genome 
Sequencing Project 2005), B. distachyon 
(The International Brachypodium Initiative 
2010), S. bicolor (Paterson et al. 2009), Z. 
mays (Schnable et al. 2009), and Hordeum 
vulgare (International Barley Genome 
Sequencing Consortium etal. 2012). This 
approach is well suited to precisely deter- 
mine the obvious structure of a large frac- 
tion of the protein-coding genes by taking 
advantage of their evolutionary conserved 
nature. However, the main limit here was 
the lack of similarity at the protein extremi- 
ties which may lead to incomplete alignment 
that prevents from finding the start and/or 
stop codons. Thus, TriAnnot utilized an itera- 
tive extension in order to identify in-frame 
start and stop codons for gene modeling. 
Models with partial structure were flagged 
pseudogenes. 

The second evidence-driven approach 
(SIMsearch module) was based on transcripts 
first, rather than proteins. SIMsearch module 
is a gene modeling program based on FPGP 
(Amano et al. 2010) and adapted specifically 


for wheat to address problems generated by 
tandem repeated genes. SIMsearch identi- 
fied the loci that are transcribed by spliced- 
alignment using est2genome (Mott 1997) 
of a series of wheat transcript libraries. The 
CDS coordinates were predicted afterward 
through similarity with Poaceae proteomes. 
SIMsearch was launched twice using two 
databanks of wheat transcripts: (1) predicted 
transcripts derived from a large RNASeq 
experiment that targeted five plant organs at 
three development stages each in two repli- 
cates (Pingault et al. 2015); (2) all available 
wheat full-length cDNAs available at EBI- 
ENA and from Ogihara et al. (2004). Thus, 
TriAnnot did not use RNASeq reads directly 
as an input. Read mapping and transcript 
calling were computed prior to gene annota- 
tion, and the predicted transcripts were pro- 
vided as FASTA input for spliced-alignment 
during the process of gene modeling. 

e Step 3: Selection of the best gene model 
at every locus. In summary, TriAnnot pre- 
dicts gene models through five approaches: 
two ab initio and three evidence-based (one 
derived from spliced-alignment of homolo- 
gous proteins+two derived from transcript 
evidence). One gene may obviously be pre- 
dicted through different ways. Thus, the final 
step is the selection of the best gene model at 
each locus. Indeed, at that step, there was no 
combination of different overlapping models 
to create a new one. 


A scoring process was applied in order to vali- 
date the existence of a gene and to retain its 
most probable structure. For scoring, TriAnnot 
used BLASTP to search for similarity of each 
model with proteomes of related Poaceae, 
including Aegilops tauschii and Triticum urartu, 
and calculate a score while considering metrics 
of the best hit alignment (percentage of identity 
and coverage, presence of canonical splicing 
sites, presence of start and stop codons). 

Gene models not supported by homology 
with Poaceae proteins or by transcription evi- 
dence were simply discarded (i.e., ab initio 
only). Models sharing similarity with known 
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proteins and for which splicing sites were sup- 
ported by transcript evidence were classified as 
high confidence. Low-confidence genes also 
share similarity with known proteins and tran- 
scripts but lack support for some splicing sites 
and/or position of start/stop codons. Finally, 
genes sharing similarity with known proteins 
but over less than 70% of the length of its best 
BLAST hit were classified as pseudogenes. 
Thus, TriAnnot predicted 107,226 gene mod- 
els: 65,884 HC and 41,342 LC genes, plus an 
additional 73,044 pseudogenes on the IWGSC 
RefSeq v1. 


4.3.2.2 PGSB Gene Prediction Pipeline 
The procedure implemented in the PGSB anno- 
tation pipeline differs in many aspects from that 
of TriAnnot. It is based on mapping all available 
evidence on unmasked genome sequence and 
filtering out TE-related predictions afterward. It 
was all evidence-driven, not using any ab initio 
gene finder. 


e Step 1: Mapping. The PGSB annotation 
pipeline combined spliced-alignments of 
reference proteins, IsoSeq reads and full- 
length cDNAs (flcDNAs), and RNASeq 
transcript predictions. In addition to the 
RNASeq atlas from Pingault etal. (2015) 
also used in TriAnnot, additional samples 
were added here. There were Illumina reads 
produced on grain-specific samples (Pfeifer 
etal. 2014), whole transcriptome PacBio 
sequenced samples (PRJEB15048), and dis- 
ease resistance gene enriched transcriptome 
samples (PRJEB23081). The latter were all 
from CHINESE SPRING but there were also 
transcriptomic data generated from other 
accessions cultivated under drought and heat 
stresses (SRP045409) and under infection 
by Fusarium graminearum (E-MTAB-1729). 
Mapping outputs were all combined, and 
mapped reads were assembled into transcripts 
with StringTie (Pertea et al. 2015). 

Protein sequences from the five species 
Arabidopsis thaliana, B. distachyon, O. 
sativa, S. bicolor, and Setaria italica, and 


F. Choulet et al. 


complete proteins from Triticeae in UniProt 
(UniProt Consortium 2018) were aligned 
with GenomeThreader independently on 
each chromosome. flcDNAs from wheat and 
barley (Mochida et al. 2009), together with 
wheat IsoSeq reads (Clavijo et al. 2017) were 
mapped with Gmap (Wu and Watanabe 2005) 
and included in the prediction pipeline. 

e Step 2: Prediction and selection of open- 
reading frames. Predictions originat- 
ing from protein alignments, full-length 
transcript alignments, and RNASeq 
were combined while removing redun- 
dancy (using Cuffcompare and StringTie). 
Then,  TransDecoder (https://github.com/ 
TransDecoder/TransDecoder/) was used to 
predict the coding frame for each transcript 
while considering the most upstream start 
codon by default. These predictions were 
then aligned against a set of reference pro- 
teins from angiosperms in UniProt, and pro- 
tein domains were also searched for. These 
data were given to TransDecoder for select- 
ing the most probable CDS for each model. 


Since TEs were not masked prior to map- 
ping evidence, PGSB predictions were filtered 
out afterward based on similarity-search with 
TE-related proteins from the PTREP library 
(https://botserv2.uzh.ch/kelldata/trep-db). 


4.3.2.3 Integration of TriAnnot and PGSB 
Gene Models with Mikado 
Selection of the best representative model at 
each locus was applied through a rule-based 
approach that combined supporting evidence 
and intrinsic gene features. PacBio transcripts, 
RNASeq reads, and homologous protein align- 
ments over the genome were used to measure 
the accuracy of predictions and a set of high- 
confidence splicing sites was established from 
RNASeq mapped reads. Mikado (Venturini et al. 
2018) was used to cluster genes from the two 
pipelines into loci, to calculate an overall score 
to each gene model, and to select the highest- 
scoring gene model. The score reflected the 
congruence between a model and its supporting 
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evidence, calculated with an average Fl-score 
(reflecting precision and recall) and metrics 
of gene feature, e.g., a penalty was applied to 
introns larger than 10 kb. After selecting the rep- 
resentative model, Mikado was used to identify 
additional high-quality alternatively spliced tran- 
scripts, only those that met a series of stringent 
requirements. The most important were: a CDS 
overlapping at least 60% of the representative 
CDS, without any retained intron, and with only 
verified exon/intron junctions. Eventually, to 
enrich the annotation, coordinates of UTRs were 
added based on comparing models and aligned 
transcripts with PASA (Haas et al. 2008). 


4.3.2.4 Gene Confidence Assignment: HC 
Versus LC 

Despite the sophisticated combination of both 
TriAnnot and PGSB predictions, the final num- 
ber of models was very high: 269,428, repre- 
senting approximately 90,000 protein-coding 
genes per (haploid) subgenome. As previously 
observed in wheat, regions showing traces 
of expression or homology with known pro- 
teins are much more abundant than expected, 
given that the number of protein-coding genes 
is a quite stable parameter in plant genomes 
with ~ 30,000 genes per haploid genome. It sug- 
gested that many gene models were in fact pseu- 
dogenes or doubtful non-coding transcribed 
regions for instance. However, both included 
filtering steps to discard models matching wheat 
transposons, before gene modeling for TriAnnot, 
after for PGSB. Thus, a confidence category was 
assigned to each gene model: high confidence 
versus low confidence. The idea was to provide 
a single filtered dataset of HC genes to people 
only interested in large-scale whole-genome 
analyses while keeping information of LC genes 
to people interested in the characterization of a 
particular region. 

First classification parameter was the com- 
pleteness of the model, i.e., the presence of 
both a start and a stop codon. HC genes were 
complete with significant homology with plant 


(Magnoliophyta) proteins retrieved from Swiss- 
Prot and TrEMBL. LC genes were, either com- 
plete but without significant homology with 
plant proteins or, incomplete with or without 
significant homology. The 269,428 gene models 
were split into 107,891 HC (40%) and 161,537 
LC (60%) protein-coding genes. The number of 
HC genes was much closer to the expected value 
for plants (~ 35,000 genes per haploid genome), 
and this became the reference dataset used by 
the community. 

However, within all the limits explained here, 
we encourage users to always keep in mind the 
level of uncertainty behind the annotation space. 
To the question “how many protein-coding 
genes are there in wheat?” we should answer: 
We do not know because the proportion of 
doubtful predictions is just too high. 


4.3.2.5 What Should Be Known About 
the LC Genes and Pseudogenes 

The consequence of confidence assignment is 
that the LC category gathered genes that were 
non-conserved, i.e., might be species-specific, 
for which we did not have enough evidence to 
conclude it is functional, together with (highly) 
conserved genes that are either pseudogenes or 
just partially assembled or mis-predicted. One 
must consider that a part of the LC genes is con- 
served but exhibits a structure likely incomplete. 
This has strong implications for researchers 
interested in a particular gene family or a par- 
ticular locus. 

In addition, a specific search for pseudo- 
genes was launched at the whole-genome level, 
based on finding DNA fragments sharing simi- 
larity with HC genes but only partially or with 
frameshifts and/or internal stop codons. In total, 
288,939 pseudogenes were discovered with 
10,440 corresponded to LC genes. Thus, the 
coding landscape is even more complicated than 
often believed, with 108 k HC, 162k LC, and 
2770 k gene fragments and so if a gene is consid- 
ered to be absent based on HC genes only, it is 
important to consider the pool of LC genes. 
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4.3.3 Comparing Genes Between A, B, 
and D Subgenomes 


4.3.3.1 Finding Homeologous 
Groups Based on HC Genes 
Only Can Lead to False 
Conclusions and Highlights 
the Requirement 
of Considering LC Genes 


Considering the conclusion of the latter para- 
graph, it implies that comparing the A-B-D 
gene repertoires was strongly impacted by the 
input gene dataset. Homeologous groups were 
inferred from gene trees. Initially, trees were 
built with the complete set of HC and LC genes 
which revealed that considering HC genes only 
led to considerably overestimate the level of 
variability between A-B-D subgenomes, because 
many LC genes were, in fact, orthologous to 
HC genes (i.e., homeologous in the hexaploid) 
even though functional annotation revealed 
that some LC genes represented mis-predicted 
TE-genes (e.g. transposase-like genes). The 
solution adopted was to work on a filtered gene 
dataset: 181,036 genes (103,757 HC and 77,279 
LC genes; instead of 269 k initially) that do not 
correspond to either TE-related functions or 
to pseudogenes. This led to determine a total 
of 39,238 homeologous groups (i.e., clades of 
A-B-D orthologous deduced from gene trees) 
and 33% of them include LC genes. In total, 
28,829 LC genes have homeologous partners 
and were thus valid for biological analyses. 

The main conclusion of the A-B-D compari- 
son was that the gene repertoire of the three sub- 
genomes is much more different than previously 
thought. The default hypothesis is often that a 
gene is present in three pairs of homeologous 
copies in bread wheat because it is a hexaploid. 
The reality is that only 5596 of the homeologous 
groups are triads, i.e., single-gene copy per sub- 
genome (configuration 1:1:1). Thus, 4596 of the 
groups represent cases where gene loss and/or 
duplications occurred after A-B-D divergence. 
Gene loss after A-B-D divergence represents the 
same proportion for A, B, and D:- 1046 of the 
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homeologous groups. Regarding gene duplica- 
tions, they also occurred in the same proportions 
in A, B, and D. This analysis suggested that the 
three lineages leading to A-B-D genomes have 
independently accumulated differences (gene 
loss and gene duplications) at similar rates. 


4.3.3.2 No Evidence of Any Biased Gene 

Fractionation and Importance 

of Gene Duplications 
Regarding gene presence/absence, no evidence 
for biased partitioning was observed (IWGSC 
2018). In contrast, comparisons support gradual 
loss/duplications that have occurred after A-B-D 
divergence in the diploid, tetraploid ances- 
tors, and after hexaploidization event in mod- 
ern bread wheat. Before gene loss, a gene may 
lose function because of silencing or change in 
expression, so that the first evidence of diploidi- 
zation might be observed at the expression level. 
Hence, RNASeq data analyses showed that there 
was an equal contribution of the three homeolo- 
gous genomes to the overall gene expression, 
demonstrating the absence of global subgenome 
dominance (IWGSC 2014). 


4.3.4 TE Modeling 


Given the amount of TEs shaping the wheat 
genome, predicting the presence of TE copies 
along assembled sequences has always been a 
prerequisite to avoid false predictions of cod- 
ing genes that are in fact coding parts of TEs. 
Efforts to manually annotate TEs with their pre- 
cise borders were made since the beginning of 
wheat BAC sequencing and a high-quality refer- 
ence databank of wheat TE sequences was ini- 
tiated in 2002 with TREP (Wicker et al. 2002) 
and completed in 2014 with the ClariTeRep 
library (Daron etal. 2014) (which includes 
TREP). ClariTeRep originated from manual 
curation of~3200 TEs along the first large 
(Mb-sized) contiguous sequences produced on 
chromosome 3B (Choulet etal. 2010). This 
implies that the wheat TE library used for sim- 
ilarity-search might be biased toward elements 
from the B-subgenome, and depleted for A and 
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D subgenomes. However, it was shown that TE 
families that shaped the three subgenomes are 
the same, although subfamilies (variants) have 
differentially invaded the A-B-D genomes in the 
diploid ancestors (Wicker et al. 2018). 

Thus, TE modeling in RefSeq v1.0 was per- 
formed only via a similarity-search approach 
against ClariTeRep. There was no de novo 
repeat-based discovery of new TEs. This led to 
the prediction of 3,968,974 copies, classified 
among 505 TE families, and representing 86%, 
85%, and 83% of the A, B, and D genomes, 
respectively. Such proportions imply that TEs 
shape large clusters with recently inserted TEs 
into older ones, a mosaic of nested insertions 
which is a computational challenge to recon- 
struct. This step was dealt with CLARITE 
(Daron et al. 2014) for RefSeq v1.0. CLARITE 
uses RepeatMasker (Smit etal. 1996-2004) 
with the cross-match engine for the first step 
of similarity-search between the genome of 
the TE library. The main problems with using 
RepeatMasker in TE-rich genomes are as fol- 
lows (i) the over-fragmentation: one copy is 
often not predicted into a single feature but 
rather split into adjacent fragments; (ii) the over- 
lap of predictions, i.e., a locus could match with 
several reference; and (iii) scattered pieces of 
a TE that has been fragmented by subsequent 
TE insertions (nested pattern) are not joint. The 
CLARITE pipeline has been developed specifi- 
cally for wheat, based on ClariTeRep, in order 
to overcome these three limitations. It uses clas- 
sification information: all TEs in ClariTeRep 
were Classified into families and subfamilies 
by sequence clustering. It also uses positions 
of LTRs in LTR-retrotransposons, which corre- 
spond to long terminal repeats (ca. hundreds of 
bps) that are largely involved in the fragmen- 
tation observed after RepeatMasker because 
both 5' and 3’ LTRs cross-match since they are 
almost identical subsequences. Family clas- 
sification and LTR positions are the two main 
points implemented in CLARITE. They allowed 
accurate defragmentation, while preventing chi- 
meric merging of adjacent features, and accurate 
reconstruction of nested TEs. 


4.4 RefSeq V1.0 Functional 


Annotation 


Gene ontology terms, PFAM, and InterPro 
domains were assigned to gene models. A func- 
tion was assigned to 82% (90,919) of HC genes 
in RefSeq Annotation v1.0. RNASeq-based tran- 
scription evidence was found for 85% and 49% 
of HC and LC genes, respectively. In addition, 
naming of gene function for each gene was per- 
formed by using the AHRD tool (Automated 
Assignment of Human Readable Descriptions, 
https://github.com/groupschoof/AHRD, 
sion 3.3.3). This program generates informative 
functional annotations from BLAST outputs 
while avoiding retrieving too many “unknown” 
or “uncharacterized” functions. BLAST outputs 
against the following databases were parsed by 
AHRD: Swiss-Prot, Arabidopsis Araprot 11, 
and a subset of TrEMBL for Viridiplantae. A 
filter was then applied in order to discard genes 
with functions related to TEs. Genes were thus 
tagged as G (canonical gene), TE (obvious 
transposon), TE? (potential transposon), or U for 
unknown. Based on this, 3294 HC genes with a 
TE tag were moved subsequently to the LC cat- 
egory in RefSeq annotation v1.1. 


ver- 


4.5 RefSeq Annotation V1.1: 
Integration of Manually 


Curated Genes 


Once Annotation v1.0 was released to the com- 
munity, researchers who are experts of some 
specific gene families brought corrections to the 
automated predictions: Sometimes gene copies 
were missing, sometimes the predicted exon/ 
intron structure needed to be curated. Feedback 
was made from the experts to the IWGSC 
Annotation Group in order to release an updated 
version 1.1. This concerns gene families CBFs, 
NLRs, PPRs, Prolamins, WAKs, and amino- 
acid transporters. A semiautomated process was 
developed in order to integrate manually curated 
gene models. It relies on a Python script using 
common tools like GenomeTools (Gremme et al. 
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2013), GFFCompare (Pertea and Pertea 2020), 
pyBEDTools (Dale etal. 2011). GffCompare 
was used to check that the curated genes did not 
overlap each other (different teams may have 
curated the same gene) and also to identify the 
RefSeq Annotation v1.0 models that required 
to be updated. Five types of correction were 
considered: (i) addition of a new gene model 
that was absent from v1.0; (ii) merging of two 
gene models; (iii) splitting of a gene model into 
two genes; (iv) correction of exon positions of 
a gene model; (v) complex cases which com- 
bined splitting and merging. RefSeq Annotation 
v1.1 includes updates of 3685 manually curated 
genes, of which 528 were not predicted by the 
automated annotation process and 354 corre- 
sponded to LC gene models. The final v1.1 HC 
gene set contained 107,891 genes. 


4.6 RefSeq Annotation V2: The 
Challenge of Transferring 
Gene Annotation Through 
the Different Versions 


of Genome Assembly 


In 2021, an update of the CHINESE SPRING 
IWGSC RefSeq Assembly was published (Zhu 
etal. 2021). Corrections were brought to the 
initial release by using new resources: Bionano 
and PacBio contigs. Inconsistencies between 
pseudomolecules and Bionano maps were rec- 
onciled, and 279 unplaced scaffolds were posi- 
tioned into pseudomolecules. PacBio contigs 
publicly available (Zimin et al. 2017) were used 
to fill gaps. Contrary to scaffold reordering, the 
gap-filling step led to complete changes in the 
positions of gene models predicted along pseu- 
domolecules, so that it was not possible to cal- 
culate new gene position from v1 to v2 with a 
simple conversion of coordinates. This raised 
two possibilities: compute de novo gene predic- 
tion or transferring the knowledge of the previ- 
ous annotation release. Since annotation v1.1 
was the outcome of an extensive effort to com- 
bine different annotation pipelines, the choice 
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was made to try to transfer as many models as 
possible while trying to optimize the trace- 
ability and to minimize the differences between 
Annotations v1 and v2. 

However, finding the new position of a gene 
required sequence alignment, which raised 
many problems in hexaploid wheat. For exam- 
ple, we used GMAP to map 298,775 HC and 
LC genes onto Assembly v2 and observed that 
32,152 (1196) could not be transferred accu- 
rately because of spurious alignments. Such 
high error rate was not acceptable and it was 
decided to develop a transfer-strategy dedicated 
to this task for wheat. It was implemented in the 
MAGATT pipeline (https://forgemia.inra.fr/umr- 
gdec/magatt). The strategy relies on reducing 
the alignment space to the shortest region pre- 
dicted to carry the gene to be mapped. In wheat, 
genes are always flanked by TEs. Although TEs 
are repeats, each copy is inserted into a different 
site. Thus, the junction between a TE extrem- 
ity and its insertion site is unique at the genome 
level. We derived all such tags from the TE 
annotation. They represent one tag every 3 kb 
(compared to one gene every 130 kb on average) 
that can be uniquely mapped from one assem- 
bly version to the other. We used these TE tags 
as anchors to define the smallest target interval 
before mapping a gene. The average size of an 
interval was 9.6 kb, which reduced the align- 
ment space and avoided most problems due to 
multiple mapping of repeated genes. Even for 
clusters of tandemly repeated genes in which 
copies could share 100% identity, this strategy 
enabled the assignment of the correct interval 
for each copy and lead to the transfer of anno- 
tation of all copies without any cross-matching. 
MAGGAT succeeded to transfer 90% of HC/LC 
genes without any difference between vl and 
v2 assemblies either in the introns or the exons, 
and 896 with mismatches due to nucleotide dif- 
ferences incorporated at the gap-filling step (in 
gap-flanking sequences). Indels were observed 
for 1% of the genes, and the remaining 1% cor- 
responded to genes for which the sequence 
was discarded when assembling v2 (Zhu et al. 
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2021). This step gave rise to the IWGSC RefSeq 
Annotation v2.1. 

Defining the target interval prior to map- 
ping has a major consequence: It avoided 
the computation of a spliced-alignment of 
a query transcript/CDS. Indeed, by default 
MAGATT starts by mapping the entire gene 
feature (exons+introns+UTRs) with BLAT 
(Kent 2002) against the short, kb-sized, target 
sequence. In the majority of the cases (90%), it 
identified a full perfect match which enabled the 
repositioning of all sub-features (i.e., exons and 
UTRs of all alternative spliced mRNAs) from 
a previous to a new assembly that shared strict 
identity. This was of major importance because 
spliced-alignments could have led to errors, 
especially when exons are very small. When 
only mismatches (no Indels) were observed 
between the two assemblies for a given gene 
(3% of genes), automated repositioning was also 
possible. Spliced-alignments of mRNAs were 
computed only when BLAT returned Indels and/ 
or partial match between a query gene and its 
target. 

MAGATT was developed with the objec- 
tive of transferring a gene annotation to a new 
assembly release for a given genotype. However, 
the strategy applies very well to the problem of 
annotating genes in the genome assemblies of 
other genotypes and is, thus, significant in the 
context of post-reference genome sequencing 
and pangenomics. Pangenomics aims at identi- 
fying conserved versus non-conserved genes in 
a series of assembled genomes. The main limit 
in this area is the quality of the gene predictions. 
It is therefore possible that presence—absence of 
a gene may simply be the consequence of anno- 
tation artifacts. Thus, MAGATT needs to be 
considered for delivering an annotation of gene 
models in new assemblies that mimics as much 
as possible the reference gene calls and avoid 
“polluting” the apparent dispensable gene set 
with differences in gene predictions. 


4.7 Plans for Future Improvements 


4.7.1 Improving Gene Structural 
Annotation 


The repertoire of 107,891 genes delivered in 
2018 for CHINESE SPRING is definitely a ref- 
erence widely used by the community. However, 
the methodological limits mentioned above 
make us consider there are improvement levers. 
First of all, we must remind here that what we 
call genes here, by default, correspond to pro- 
tein-coding genes. Non-coding RNA genes 
remains largely unexplored in this complex 
genome although we have no doubt their predic- 
tion along the genome sequence represents one 
of the most challenging tasks but also one of the 
most impacting novel information to increase 
our understanding of the functional sequences. 

Regarding protein-coding genes, when we 
discuss the improvement of structural annota- 
tion, we distinguish two different things: (i) 
existence of the gene and (i) structure of the 
gene. In other words, improvements concern, 
on one side, genes that are missing in the anno- 
tation and gene models that do actually not 
correspond to real genes. On the other side, 
improvements concern the exact structure of a 
gene and its transcripts. 

A key question that impacts on both aspects 
is the presence of pseudogenes. Pseudogenes 
are sequences derived from functional genes but 
that have accumulated mutations (frameshift, in- 
frame stop codon, truncation) which switched 
its function off. Pseudogenes are hard to model 
automatically because gene modeling usually 
uses structural features (coding frame, start 
and stop codons) to call a gene while in case 
of pseudogenes, these features are disturbed. 
Manual curation of genes remains the best 
way to classify a sequence as a pseudogene. 
Although community annotation (jamboree) 
event was not organized in the framework of the 
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IWGSC, the IWGSC did establish a procedure 
in order to integrate curation made by differ- 
ent expert groups at the international level. This 
led to several updates: annotation releases v1.1, 
v1.2, and v2.1. Manual curation by experts rep- 
resents 2-3% of the gene content in v2.1. 

The current status with respect to wheat gene 
models is: 108k HC genes, 162 k LC genes 
plus an additional 279 k gene fragments found 
by scanning for fragments of coding DNA in 
the unannotated part of the genome. It is clear 
that, with such a complicated landscape, manual 
curation is an endless task. However, lots could 
be done through bioinformatic approaches com- 
bined with manual curation in order to increase 
annotation quality. But even curators need infor- 
mation for taking decision on the most probable 
gene structure to consider and an open question 
is "which information/resources are lacking 
and which strategies could be useful for help- 
ing with increasing the quality of gene model 
predictions?”. 


4.7.1.1 Transcription Evidence, 

Gene Finders, and Homology 

with Related Species: Comparing 

A-B-D is the Most Highly Valuable 

Option to Improve the Quality 

of Structural Annotation 
Finding a gene is based on three pieces of evi- 
dence: (i) a sequence is transcribed (RNASeq); 
(ii) a sequence shares similarity with proteins 
already predicted in divergent genomes; (iii) a 
sequence has a high probability to be protein- 
coding (based on hidden Markov models). 

Do we miss transcript data? As early as in 
2014, up to one million loci matching RNASeq 
data (short reads) were highlighted but even 
then, there were still 15% of the HC genes for 
which no transcription evidence was found 
(IWGSC 2018). 

What about gene finders? The wheat 
genome is made of ca. 12 Gb of transposon- 
derived sequences while gene models represent 
0.13-0.23 Gb (depending on whether or not 
LC genes are considered). The wheat genome 
is full of coding-like DNA but the very wide 
majority is related to TEs (transposase, reverse 
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transcriptase, integrase, etc.). The consequence 
is that the unannotated part of the genome, rep- 
resenting ca. 10-15% (1.5-2.0 Gb), i.e., 10 
times more than the gene space, often corre- 
sponds to unidentified degenerated TEs. This 
means that ORFs derived from degenerated TEs 
are an extremely abundant source of false posi- 
tive predictions for gene finders. 

Detecting sequence homology with related 
genomes appears to us an underestimated lever 
of improvement. This evidence relies the evolu- 
tionary definition of a gene: an entity submitted 
to selection pressure. If a sequence is conserved 
across millions of years of evolution, we can be 
confident it is a gene. Predicted proteomes of 
Poaceae have been used in wheat gene mode- 
ling. However, improvements seem here obvious 
since there were not that many genomes avail- 
able. Among the Poaceae, knowledge from the 
sequenced and annotated genomes of O. sativa, 
Z. mays, S. bicolor, B. distachyon, and S. italica 
were used for wheat gene modeling. They share 
a common ancestor with wheat between 30 and 
60 MYA. Outside the Poaceae, fewer genes are 
conserved and sequence identity, even at the pro- 
tein level, is low (around 55% with Arabidopsis 
for instance) which would not be of great inter- 
est to improve the annotation. Indeed, widely 
conserved genes are the easiest to annotate. In 
contrast, the challenge of annotation relies on 
finding genes that are specific to the Triticeae 
tribe, the Triticum/Aegilops genera, or even to 
the T. aestivum species. So, the most helpful 
resource to ensure efficient gene modeling in 
wheat is the Triticeae species, where genomes 
diverged 3—13 MYA, and which share high level 
of synteny and high level of gene sequence con- 
servation. For instance, 88% of the predicted 
wheat genes (IWGSC v2.1) share on aver- 
age 84% protein identity with barley predicted 
proteins (based on first BLAST hit alignment 
with thresholds 5096 query overlap, 3596 iden- 
tity) (Mascher et al. 2017). But even TEs share 
sequence similarity between Triticeae genomes, 
meaning that conservation is not synonymous 
of selection pressure when aligning barley and 
wheat genomes. However, we could take advan- 
tage of the near-complete TE turnover (Wicker 
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Fig. 4.1 Sequence alignments visualized with ACT 
(Carver et al. 2005) of three wheat homeologous regions 
of chromosomes 3A, 3B, and 3D. CDSs are represented 
in light blue, genes in white, and TEs in blue, across 
the six coding frames. Red blocks represent sequence 
conservation (> 85% identity) between A-B-D regions 
carrying homeologous genes and surrounding regions 


et al. 2018) that led to erase ancestral TEs so 
that there are (almost) no syntenic/orthologous 
TEs between A, B, D (Triticum and Aegilops), 
H (barley; Hordeum), and R (rye; Secale) 
genomes. All these genomes diverged between 
3 and 13 MYA, a timeframe consistent with (1) 
a complete TE turnover (2) within a conserved 
gene backbone. This is the ideal situation to 
identify new genes based on aligning syntenic 
regions. Each segment of conserved sequence 
between A-B-D-H-R genomes (and others) at a 
micro-syntenic location is evidence for selection 
pressure and, thus, for the presence of a gene 
(protein-coding or not) or a sequence involved in 
regulation processes called conserved non-cod- 
ing sequence (CNS) as shown in Fig. 4.1. 


4.7.1.2 To What Extent Sequencing More 
Wheat Genomes Help Improving 
the Reference Wheat Gene 
Catalog? 

As explained above, the divergence window 

3-13 MYA of Triticum, Aegilops, Hordeum, 

Secale, and others combines the advantages 


while TEs are not conserved between homeologous 
loci. Yellow blocks indicate the presence of a highly 
conserved unannotated sequence (neither gene nor TE) 
between A-B-D which strongly suggests the presence of 
a functional sequence subject to selection pressure that 
may correspond to a yet uncharacterized gene 


of a high level of gene conservation with 
the (almost) absence of orthologous TEs. 
Sequencing more T. aestivum genomes will 
not be useful in that regard. Indeed, divergence 
is too low so that sequence conversation is not 
evidence for selection pressure. Most TEs are 
conserved (orthologous) even between diver- 
gent accessions from the Asian and European 
pools, as highlighted by the Renan versus 
Chinese Spring comparison (Aury et al. 2022). 
However, sequencing more wheat genomes will 
be exploited for building the wheat pangenome. 


4.7.2 De Novo Annotation Versus 
Annotation Transfer 


With the advances made in sequencing tech- 
nologies, assembling reference-quality wheat 
genome sequences is not a limit anymore (Guo 
et al. 2020; Walkowiak et al. 2020; Sato et al. 
2021; Athiyannan et al. 2022; Aury et al. 2022). 
Building a wheat pangenome is thus a crucial 
objective in order to distinguish core versus 
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dispensable genes, especially since dispensa- 
ble genes are the best candidates for adaptation 
to the environment, like response to specific 
pathogens. In contrast, core genes are enriched 
in essential genes, somehow not the privileged 
targets to search for genetic diversity controlling 
contrasted phenotypes. 

Presence/absence (and copy number) varia- 
tions of genes between two genotypes are lim- 
ited to a few percent (De Oliveira et al. 2020). 
Using resequencing data of chromosome 3B 
from 20 T. aestivum accessions, it was shown 
that variable genes represent between 2 and 
6% of pairwise comparisons with CHINESE 
SPRING. This weak percentage implies that 
approximations due to incomplete genome 
assembly and differences in gene predictions 
will strongly impact our capabilities to deter- 
mine if a gene a really absent. Thus, an under- 
estimated limit that prevents from accurate 
pangenome construction is the annotation step. 
Automated gene modeling is strongly depend- 
ent on the methods, tools, thresholds, used so 
that two annotations of the same genome are 
systematically different. Additionally, these dif- 
ferences are not only background noise. For 
instance, when the IWGSC RefSeq Annotation 
v1.0 was produced by combining independent 
predictions from two pipelines (TriAnnot and 
PGSB), 2096 of each gene set did not overlap 
any prediction from the other one. Moreover, 
only 67 and 48% of TriAnnot and PGSB gene 
models were predicted with highly similar struc- 
tures. These differences exceed largely the real 
presence/absence variations. The consequence 
is that pangenomic analyses are dependent on 
accurate mapping of a reference gene annota- 
tion to another assembly. This is why we believe 
annotation transfer tools like MAGATT (see 
paragraph RefSeq Annotation v2) are highly 
valuable in the pangenomic area as well as for 
maintaining improvements performed through 
manual curation. Eventually, in future wheat 
genome assemblies, genes will be transferred/ 
projected from a reference pangenome and de 
novo annotation should be restricted to specific 
(non-conserved) regions. Indeed, gene projec- 
tion was already applied for the annotation of 
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chromosome pseudomolecule assemblies of 15 
wheat accessions with the objective of building 
a wheat pangenome (Walkowiak et al. 2020). 
Besides the methodological challenge, issues 
of multiple identifiers (IDs) for a gene will 
become more and more problematic, as exem- 
plified in the review of Adamski et al. (2020). 
Authors have highlighted the fact that one gene 
is already represented by many IDs, some- 
times following different nomenclatures, due 
to the existence of multiple assemblies of the 
CHINESE SPRING genome sequence itself plus 
the release of gene models from wild wheat rela- 
tives and other cultivated genotypes. There is, 
thus, a strong need for integrating these data. 


4.7.3 Functional Annotation: 
Opportunities 


Automated functional annotation workflow 
based on sequence similarity and domain search 
has been established by IWGSC to assign gene 
ontology (GO) and function descriptions to 
the wheat reference gene set (IWGSC 2018). 
Although approaches based on local alignment 
search such as BLAST are straightforward and 
work well for certain species and gene families, 
the drawbacks are clear. It suffers from low sen- 
sitivity or specificity, depending on threshold 
choice and evolutionary distance of query gene 
set to species in the annotation source (Sasson 
et al. 2006). In addition, error or lack of robust 
annotation evidence in the source databases hin- 
der or bias the large-scale functional annotation 
analysis, especially in non-model crop species. 
To overcome these limitations, integrating 
various omics datasets from high-throughput 
experiments in combination with novel com- 
putational approaches has been considered for 
complementation to local sequence alignment 
methods, facilitating annotation of unknown 
genes or transferring functional knowledge from 
one gene to another. For example, generation 
and analysis of large-scale biomolecule interac- 
tion networks is a useful approach that utilizes 
omics data beyond gene/protein sequences. 
The basic idea is “guilt by association,’ where 
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a gene can be assigned a particular function if 
it is co-expressed with one or several genes of 
same known function, as the chance that they 
are co-regulated and needed for the same pro- 
cess or pathway is high (Tohge and Fernie 2012; 
Aoki et al. 2016). In addition to co-expression, 
gene-gene relationships such as protein-DNA 
binding and protein-protein interactions can be 
used to assign and transfer function from one 
gene to the other (Cho et al. 2016). Such interac- 
tome data can now be generated with advanced 
high-throughput experimental techniques such 
as single/bulk RNAseq, Yeast 2-Hybrid, and 
DNA affinity purification sequencing (DAPseq). 
Each type of interactome networks can be ana- 
lyzed separately or in a combined manner to 
build multi-omics integrated network, followed 
by computational interpretation, from naive 
method of evidence aggregation to probabilistic 
modeling (Yu et al. 2015). The ranking or scor- 
ing reflecting proximity or connectivity of genes 
in the network is then used to link and transfer 
function from one gene to the other. Beyond 
the classic "single-gene" approach, integrated 
network-based approaches provide a more 
holistic view of gene function and gene-gene 
relationships, enabling functional annotation of 
unknown genes that are not related on sequence 
level but functionally interacted with studied 
genes (see also Chap. 11). 

Choice of cutoff for sequence similarity- 
search and network mining is crucial but highly 
arbitrary, which can create bias or error in 
functional annotation process. In addition, link 
between various protein features (structure, text 
description, and interaction) and annotation 
label that can be utilized for functional anno- 
tation are sometimes beyond human knowl- 
edge and difficult to be revealed. In contrast, 
machine learning tools are suitable to identify 
these hidden features and assess their contri- 
bution to functions by analyzing a training 
set where a group of genes with these features 
are functionally characterized (Mahood et al. 
2020). Quantitative contribution of different 
features learnt by computer is then exploited to 
predict the most possible function of unknown 


genes possessing same feature types. Several 
tools have been developed to learn relationship 
between GO and heterogeneous data (text and 
sequence information, protein structure) and 
propose a predictor for annotating unknown 
genes (Tórónen etal. 2018; You etal. 2018, 
2019). 

Although highly advantageous compared 
to classical approaches, conventional machine 
learning is achieved using handcrafted features. 
Deep learning using neural networks, on the 
other hand, can extract abstracted and high-level 
features from raw data directly and build a pre- 
dictor, without human inference. The availabil- 
ity of omics data and computational resources 
allows to develop sophisticated deep learning 
algorithms for large-scale functional annota- 
tion. Various deep learning architectures have 
been built using, e.g., deep, convolutional and 
recurrent neural network, which have specific 
strength in learning different features (Cao 
etal. 2017; Sureyya Rifaioglu etal. 2019; Du 
et al. 2020). Tools built on these architectures 
predict GO terms either by learning protein 
sequence (Kulmanov and Hoehndorf 2020; Cao 
and Shen 2021), protein structure (Tavanaei 
et al. 2016; Jumper et al. 2021) or heterogenous 
data and networks (Cai etal. 2020; Peng et al. 
2021). Several factors limit the application of 
deep machine learning approach for functional 
annotation in large-scale and unbiased manner. 
Firstly, although various omics and structure 
data are useful, only primary sequence is avail- 
able for majority of unknown genes. Secondly, 
imbalance and incompleteness of GO database 
with respect to species and function categories 
can bias the learning step, and GO prediction 
task itself is a complex multi-label problem. 
Lastly, the quality of transferring gene model 
information between species that are evolu- 
tionarily distant needs to be assessed carefully. 
Nevertheless, despite these challenges, deep 
machine learning-based functional annotation 
and GO assignment have been successfully 
applied and will continue in many studies, with 
the support of the continuing expansion of high- 
quality omics and experimental datasets. 
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Abstract 


Gene expression patterns have been a widely 
applied source of information to start under- 
standing gene function in multiple plant 
species. In wheat, the advent of increas- 
ingly accurate and complete gene annota- 
tions now enables transcriptomic studies to 
be carried out on a routine basis and studies 
by groups around the world have compared 
gene expression changes under an array of 
environmental and developmental stages. 
However, associating data from differentially 
expressed genes to understanding the biologi- 
cal role of these genes and their applications 
for breeding is a major challenge. Recently, 
the first steps to apply network-based 
approaches to characterise gene expression 
have been taken in wheat and these networks 
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have enabled the prediction of gene func- 
tions in wheat but only for a handful of traits. 
Combining advanced analysis methods with 
better sequencing technology will increase 
our capacity to place gene expression in 
wheat in the context of functions of genes 
that influence agronomically important traits. 


Keywords 


Wheat transcriptome - Gene networks - 
Response to environment - Development 


5.1 Gene Function Through Gene 
Expression 


In order to understand gene function, one of 
the first things researchers would like to do is 
measure gene expression—when, where and 
how much of a gene’s transcript is present? 
Measuring the expression level of a single gene 
through quantitative PCR can reveal insight 
into a specific gene and its potential biological 
role. However, to explore the integrated nature 
of gene expression and how entire biological 
processes work at the transcriptional level, it 
is desirable to measure the expression level of 
multiple genes simultaneously using transcrip- 
tomics. In model species, transcriptomics has 
shed insight into the regulation of developmen- 
tal processes, responses to the environment and 
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genotype-specific responses, all of which would 
be highly advantageous to understand for wheat 
improvement. Therefore, transcriptomics has 
been widely applied in wheat biology. 

Initially, transcriptomics largely relied upon 
microarray approaches. These were useful 
in determining gene expression patterns, but 
microarrays in wheat were limited because of 
the incomplete gene model annotations avail- 
able when microarrays were designed, therefore 
many genes were missing from the arrays. The 
advent of RNA-seq to measure gene expres- 
sion enabled more accurate measurement of the 
wheat transcriptome. Transcriptomics could be 
applied even before high-quality genome assem- 
blies were available because de novo transcrip- 
tome assemblies could be generated to answer 
specific biological questions using individual 
datasets. However, to get the highest quality and 
most comprehensive results in a transcriptomic 
experiment, having a reference transcriptome 
is valuable and also removes the requirement 
to carry out a de novo assembly for each new 
project. Furthermore, the availability of a refer- 
ence transcriptome facilitates the identification 
of homoeolog-specific transcripts and therefore 
allows gene expression to be quantified in a 
homoeolog-specific manner. 


5.2 Measuring Homoeolog-Specific 


Gene Expression 


As consequence of the polyploid nature of 
wheat,>50% of genes in the wheat genome 
are present as triads of related homoeologous 
genes on the A, B and D subgenomes (IWGSC 
etal. 2018). Studies on a gene-by-gene basis 
have revealed that each homoeolog in wheat can 
have different expression levels. For example, 
the calcium-dependent protein kinase TaCPK2 
has differential responses to stress between 
homeologs with the A homeolog upregulated 
in response to powdery mildew infection and 
the D homoeolog upregulated in response to 
cold stress (Geng etal. 2013). However, to 
analyse homoeolog-specific expression using 
qPCR is labour-intensive and requires the 


T. Andleeb et al. 


design of homoeolog-specific primers for each 
gene of interest. The use of transcriptomics 
allows quicker and easier homoeolog-specific 
gene expression measurements. Several differ- 
ent ways to quantify homoeolog-specific gene 
expression in allopolyploids have been imple- 
mented including alignment to the individual 
subgenomes and read classification according 
to mismatches or inter-homoeolog SNPs (Kuo 
et al. 2020), alignment to the whole genome 
sequence using a standard aligner and select- 
ing only uniquely mapping reads (e.g. He et al. 
2022) or pseudoalignment to the transcrip- 
tome using kallisto which has been demon- 
strated to assign reads to appropriate homoeolog 
using nullitetrasomic lines (Borrill et al. 2016; 
Ramírez-González etal. 2018). Homoeolog- 
specific gene analysis has been used to study 
multiple biological questions and has for exam- 
ple revealed homoeolog-specific gene expression 
responses to stress conditions (e.g. Clavijo et al. 
2017) and developmental stage and tissue-spe- 
cific homoeolog expression (Ramírez-González 
et al. 2018). In order to maximise information 
gained from applying transcriptomic approaches, 
it is necessary to define which genes are pre- 
sent within the genome and have accurate gene 
annotations to capture the complexities of gene 
expression in this polyploid species. 


5.3 Building Transcriptome 
Annotations in Wheat 
5.3.1 Expressed Sequence Tags 


and Full-Length cDNAs 


The large size of the wheat genome made 
sequencing the entire wheat genome and the 
genes within it a difficult prospect in the 1990s 
and 2000s due to the high cost and sequenc- 
ing technology limitations (see also Chap. 1). 
However, the importance and usefulness of 
having gene sequence information was clear. 
An alternative way to obtain gene sequence 
focussed on expressed sequence tags (ESTs), 
which provided a quicker way to determine 
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Fig. 5.1 Improvements in transcriptome assemblies 
in the last 20 years. Transcriptome sequences have pro- 
gressed from expressed sequence tags (EST) which had 
unknown chromosomal positions and were often par- 
tial sequences, through full-length cDNAs (flcDNAs) 
to the initial genome assemblies (454 assembly) which 
often lacked annotation, through to fragmented assem- 
blies with gene model predictions such as the CHINESE 


gene sequences and expression information 
(Fig. 5.1). ESTs were generated by extracting 
RNA from a tissue or tissues of interest and 
building a cDNA library in E. coli. Plasmids 
from the E. coli library were extracted and 
sequenced through Sanger sequencing before 
bioinformatic analysis to group sequences into 
contigs containing related sequences. ESTs were 
generated from multiple wheat tissues (Ogihara 
et al. 2003; Manickavelu et al. 2012) and sam- 
ples grown under stress conditions (Chao et al. 
2006; Mochida etal. 2006) resulting in the 
identification of over 1 million EST sequences 
grouped into tens of thousands of contigs. By 
filtering these contigs for sequences contain- 
ing both start and stop codons, it was possi- 
ble to identify full-length cDNA representing 


SPRING Survey (CSS) and The Genome Analysis 
Centre (TGAC) assembly, to highly complete transcrip- 
tome assemblies on contiguous chromosome-scale scaf- 
folds (RefSeqv1.1). Sequencing and assembly of tran- 
scriptomes for multiple wheat cultivars will reveal the 
pan-transcriptome and variation therein including copy 
number variation (CNV) 


entire coding sequences, although the numbers 
were significantly lower than the number of 
ESTs. For example, the 1 million EST generated 
by Manickavelu etal. (2012) were classified 
into 37,138 contigs of which- 7000 were full 
length. Significant efforts were made to obtain 
à good representation of full-length cDNAs, and 
the resulting sequences (- 20,000 full-length 
cDNAs) were gathered into databases (Kawaura 
et al. 2009; Mochida et al. 2009). 


5.3.2 Integrating Gene Annotation 


into Genome Assemblies 


In parallel with the development of flcDNA 
libraries, many groups embarked upon 
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projects to sequence the wheat genome. The first 
sequence of a wheat genome with associated 
gene annotations was published in 2012 using 
the cultivar CHINESE SPRING (Brenchley 
et al. 2012). The low sequencing coverage (5x) 
using 454 technology meant that the assembly 
was highly fragmented (over 5 million scaf- 
folds), yet it was extremely useful to research- 
ers offering the first extensive set of genomic 
sequences. Approximately 95,000 genes were 
annotated using orthologs to flcDNAs from 
rice, sorghum, Brachypodium and barley. Two- 
thirds of these genes were assigned to the A, B 
or D subgenome but it was not possible to assign 
genes to individual chromosomes. This data pro- 
vided larger number of gene annotations than 
were available from flcDNAs, although not all 
fIcDNAs were represented and many of the gene 
models were fragmented (Fig. 5.1). Nonetheless, 
this assembly illustrated that whole genome 
sequencing of wheat was possible and could 
make major contributions to generating a com- 
plete set of gene models. 

The next major improvement in gene models 
was achieved by applying flow-sorting technol- 
ogy to separate individual chromosome arms 
prior to sequencing (see Chap. 3). This allowed 
gene models to be assigned to individual chro- 
mosome arms, identifying homoeologous genes 
with confidence, and positional information was 
added through the use of synteny and genetic 
mapping approaches. In total 124,201 genes 
were annotated and assigned to individual chro- 
mosomes, and 75,183 had positional informa- 
tion. These genes were located across a total 
10.2 Gb assembly of CHINESE SPRING (the 
CHINESE SPRING Survey; CSS; Fig. 5.1; 
IWGSC et al. 2014). However, the fragmented 
nature of this assembly with only 70% of the 
assembly in contigs longer than 1 kb, meant that 
although the number of genes identified was 
high, many genes were not full length for exam- 
ple due to a gene model being truncated at the 
end of a contig (Brinton et al. 2018). 

Improvements to assembling complete gene 
models came largely through improved conti- 
guity in genome assemblies. The use of varying 
sized mate-pair libraries and a new assembly 
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algorithm produced a new CHINESE SPRING 
assembly (Clavijo etal. 2017) with a longer 
contig size with over 80% of the assembly hav- 
ing contigs larger than 32 kb. In total 104,091 
gene models were annotated, which is ~ 20,000 
genes fewer than in the CSS assembly (IWGSC 
etal. 2014), but these new gene models were 
generally more complete because the higher 
assembly contiguity meant it was much less 
likely that a gene model was truncated at 
the end of a contig (Fig. 5.1). An additional 
CHINESE SPRING assembly (Triticum3.1) 
achieved much-increased contiguity by combin- 
ing Illumina short reads with PacBio long reads, 
with over 5096 of the assembly having contigs 
larger than 232 kb (Zimin et al. 2017), but this 
assembly lacked gene annotations. 

The next step change came with the publi- 
cation of the RefSeqv1.0 CHINESE SPRING 
genome assembly (IWGSC etal. 2018). This 
pseudomolecule-level 14.5 Gb assembly used 
a de novo assembly approach, an improved 
assembly method and additional layers of 
genetic, physical and sequencing data to gener- 
ate a long-range ordered assembly with accu- 
rate assignment of homoeologs. In total 107,891 
high-confidence genes were annotated by com- 
bining the outputs of two prediction pipelines. 
These gene models represented a higher propor- 
tion of conserved BUSCO single-copy genes 
than previous assemblies with 90% of BUSCO 
genes present as three complete copies in the 
RefSeq assembly, compared to 7096 in the 
TGAC assembly and 25% in the CSS assembly. 
Approximately, 2,000 gene models were manu- 
ally refined, resulting in the RefSeqvl.1 gene 
model set (Fig. 5.1). 

Although highly complete, further improve- 
ments have been made to these gene models. 
By combining the long-read-based Triticum . 
aestivum 3.1 genome assembly with informa- 
tion from the RefSeqv1.0 assembly to improve 
scaffolding and annotation, a more complete 
(15.1 GB) annotated CHINESE SPRING 
assembly was obtained: Triticum_aestivum_4.0 
(Alonge etal. 2020). The use of long reads 
enabled many repeat regions to be expanded 
in this assembly, including regions containing 
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thousands of additional gene copies. This gave 
a total of 108,639 genes localised to individual 
chromosomes. In parallel, further refinements 
were made to the RefSeqv1.0 by incorporating 
optical maps and PacBio long reads to generate 
RefSeqv2.1 (Zhu et al. 2021). Although the total 
assembly size did not change much (14.6 GB 
in RefSeqv2.1 vs. 14.5 GB in RefSeqv1.0), 
positions and orientations of scaffolds were 
corrected for 1096 of the genome and gaps 
were filled. In total 106,913 high-confidence 
genes were annotated by aligning gene anno- 
tations from the RefSeqvl.] and community 
annotations. 


5.3.3 Remaining Challenges 
to Improve the Accuracy 
and Completeness of the Gene 
Model Set 


Discrepancies remain between the Triticum aes- 
tivum 4.0 and RefSeqv2.1 assemblies in some 
regions, and integration of new data types will 
be required to resolve localised gaps or errors, 
and to assign all scaffolds to accurate posi- 
tions. Gene annotations may also be inaccurate 
in a minority of regions due to remaining gaps 
or inaccuracies. Both these assemblies rely on 
the transfer of gene models from RefSeqv1.1, 
so there may be value in re-annotating these 
genomes from de novo predictions and RNA- 
seq data to take advantage of these more accu- 
rate sequences. A final consequence of relying 
largely on the RefSeqvl.1 gene models is that 
alternative spliced isoforms may not be fully 
represented with only 15.796 of high-confidence 
genes having alternative isoforms (IWGSC et al. 
2018), due to conservative parameters used dur- 
ing the transcriptome assembly. 

Although technical challenges remain to 
perfect the CHINESE SPRING gene models, a 
more pressing challenge will be to identify vari- 
ation between gene models in different wheat 
cultivars. Work by Montenegro etal. (2017) 
showed that gene content was variable between 
18 wheat cultivars, with - 81,000 genes shared 


between all cultivars and an additional 60,000 
genes detected in at least one cultivar. The large 
average number of genes detected in each culti- 
var in this study (128,656) may be an artefact of 
basing gene model discovery on the fragmented 
CSS assembly; nonetheless, the variation in 
gene models is likely to have significant conse- 
quences to understanding wheat biology (see 
Chap. 4). More recently whole genome sequenc- 
ing of 15 cultivars in additional to CHINESE 
SPRING revealed extensive structural and 
haplotype divergence between wheat cultivars 
(Fig. 5.1; Walkowiak etal. 2020). Significant 
differences were found in gene content between 
cultivars with- 12% of genes showing pres- 
ence-absence variation, although this was based 
on projecting gene annotations from CHINESE 
SPRING, rather than de novo genome annota- 
tion tailored to each cultivar. Individual genome 
annotations for each of these high-quality 
genome sequences will be a valuable resource 
for biologists and breeders alike and is likely to 
identify genes absent from CHINESE SPRING. 

Beyond increasing the number of cultivars, 
it will also be important to increase the accu- 
racy of gene models beyond the coding region, 
which is so far the most accurate portion of 
wheat gene models. The 5' and 3' untranslated 
regions are annotated in many genes, but their 
accuracy is not known and specialised next-gen- 
eration sequencing approaches could be used, 
such as CAGE-seq to identify transcription start 
sites and PolyA-seq to identify transcription 
end sites, as has been done in cotton to gener- 
ate accurate untranslated region annotations 
(Wang et al. 2019). The use of PacBio Iso-seq 
long reads in conjunction with Illumina short 
reads and stringent filtering can also increase 
the accuracy of transcript start and end sites, 
as well as providing information about splice 
junctions. This has been achieved in wheat's 
close relative barley (Coulter et al. 2021). This 
approach identified that 73% of multi-exonic 
barley genes had two or more transcript iso- 
forms, suggesting that the current wheat anno- 
tations may be missing transcript isoforms in 
many multi-exonic genes. 
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5.4 Methods of Measuring Gene 
Expression at the 
Genome-Wide Level 


The availability of high-quality gene models 
now facilitates the accurate measurements of 
gene expression using RNA-seq. The most com- 
mon type of RNA-seq is the enrichment and 
subsequent sequencing of polyadenylated RNA 
to study mRNA levels. Reduced representation 
sequencing can also be applied to reduce costs. 
For example, 3’ end sequencing can be used for 
investigating the expression profile of genes at a 
lower cost due to reduced sequencing require- 
ments and targeted RNA-seq can be used to 
sequence-specific targets, primarily those with 
low expression profiles. More recently, low 
input RNA-seq methods from small tissues to 
single-cell approaches have been developed. 
These enable the measurement of gene expres- 
sion in different cell types and determine co- 
expression and gene regulation in single cells, 
although their application in wheat remains 
limited. 


5.5 Diverse Biological 
Questions Can Be Answered 


with Transcriptomics 


Transcriptomics approaches have been applied 
in many different types of studies in wheat. 
These include observing changes in the tran- 
scriptome over a developmental time course, 
studying gene expression responses to differ- 
ent stresses or investigating the effect of a spe- 
cific gene on downstream molecular pathways 
(Fig. 5.2). 


5.6 Elucidating Genetic Control 


of Developmental Processes 


Transcriptomic approaches can help build 
understanding of developmental processes by 
studying gene expression throughout a time 
course or by focussing on the transcriptional 


T. Andleeb et al. 


changes induced by manipulating a gene reg- 
ulating development, for example through 
mutants or overexpression. Here we will dis- 
cuss typical approaches which use RNA-seq to 
understand developmental processes in wheat. 


5.6.1 Studying Gene Expression 


During Time Courses 


Grain development is an important process 
which influences final yield and quality in all 
cereal crops and has therefore been examined 
at the transcriptomic level by several groups. 
For example, using the CHINESE SPRING 
Survey (CSS) sequence annotation, Pfeifer 
etal. (2014) identified cell-type and homoe- 
olog-specific gene expression during grain 
development at three timepoints. Building upon 
this work Chi et al. (2019) studied gene expres- 
sion across four timepoints in grain develop- 
ment, although they did not dissect grains into 
individual cell types. Differentially expressed 
genes were clustered into groups based on 
developmental stages and assigned putative 
functions based on gene ontology (GO) and 
Kyoto Encyclopaedia of Genes and Genomes 
(KEGG) enrichment analyses. Many more dif- 
ferentially expressed genes were identified than 
was possible using previous microarray-based 
approaches and the more accurate and complete 
gene models facilitated the analysis (Yu et al. 
2016). A similar approach was used to investi- 
gate wheat spike development at four different 
stages (Feng etal. 2017). Clustering analysis 
of genes differentially expressed over the time 
course identified dynamically expressed tran- 
scription factors which the authors hypothesise 
may regulate spikelet initiation and floral organ 
patterning, inferred from their times of expres- 
sion and orthologs in model plants. The puta- 
tive functions of the differentially expressed 
genes found in this study were assigned using 
GO enrichment analysis, giving an insight into 
the functions of individual genes as well as 
temporal dynamics of expression (Feng et al. 
2017). 
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Fig. 5.2 RNA-seq is frequently used to assess the 
effects of altering a single gene or environmental/devel- 
opmental change on gene expression. The data collected 
is used to identify differentially expressed genes (DEGs) 
which can then be analysed through methods including 
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pathway or gene ontology (GO) analysis, or by cluster- 
ing gene expression profiles. Specific exploring of dif- 
ferentially expressed genes, pathway and clustering 
information can uncover the biological pathways and 
mechanisms through which a gene or environmental/ 
developmental response operates 
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5.6.2 Understanding the Influence 
of Individual Genetic 
Components ona 
Developmental Process 


Understanding general expression changes dur- 
ing development is important, but many geneti- 
cists aim to characterise the precise effects of 
individual genes and RNA-seq can contribute 
to this goal. Flowering time is one of the best- 
characterised processes in wheat with many 
important genes identified. Transcriptomic 
approaches have deepened our understand- 
ing of flowering time pathways by comparing 
the expression profiles of wild type and plants 
mutated in or overexpressing key floral regula- 
tors (see also Chap. 11). For example, Pearce 
etal. (2016) studied the phytochrome light 
receptors using RNA-seq-based methods to bet- 
ter understand how they regulate the develop- 
mental transitions controlled by changes in light 
levels. Under long-day conditions, PHYB was 
found to regulate approximately six times more 
genes than PHYC and only a small number of 
genes were under transcriptional control of both 
phytochrome genes. Similarly, under short-day 
conditions PHYB influenced the transcription of 
approximately five times more genes than PHYC 
(Kippes et al. 2020). Surprisingly in phyB and 
phyC mutants flowering was accelerated under 
short-day conditions, which is unexpected in 
a long-day plant like wheat. Transcriptomic 
analysis revealed this may be mediated through 
flowering promoting genes VRN-A] and PPD- 
B1. This work shows that these RNA-seq tran- 
scriptome methods can uncover the functions 
of genes in a developmental process as well as 
identify downstream targets of these genes. 


5.6.3 Atlases of Gene Expression 


Beyond individual studies of gene expres- 
sion, collating gene expression data for future 
analysis via gene expression atlases allows 
researchers to address a range of biological 
questions without the need to carry out more 
RNA-sequencing. Several different atlases 
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have been built for wheat including the exp VIP 
gene expression atlas which contains RNA-seq 
data from» 1,000 RNA-seq samples, includ- 
ing diverse tissue types, developmental stages, 
cultivars and environmental conditions (Borrill 
etal. 2016; Ramírez-González etal. 2018). 
A pictorial representation of gene expression 
across 70 different tissue-developmental stages 
is also available through the wheat eFP browser 
which provides a powerful tool for intuitive 
gene expression exploration (Winter et al. 2007; 
Ramírez-González et al. 2018). 


5.7 Response to Environmental 


Stress 


Transcriptome analyses are also a powerful tool 
to understand how wheat responds to different 
environmental stresses, including both abiotic 
and biotic stresses. Genome-wide scale changes 
in the transcriptome can be investigated by 
examining the transcriptome changes after the 
application of the stress or differences between 
plants with susceptible or resistant genotypes. 
The effect of single genes on the response can 
be investigated by comparing lines with precise 
genetic differences such as near-isogenic lines, 
overexpression or mutant lines. 


5.7.1 Genome-Wide Transcriptional 


Responses to Stress Conditions 


RNA-seq has been used to characterise gene 
expression changes in response to a wide range 
of environmental stresses from pathogen infec- 
tion (e.g. Zhang et al. 2014; Dobon et al. 2016) 
through to abiotic stresses including drought, 
heat, salinity and cold (e.g. Liu et al. 2015; Xiong 
et al. 2017; Li et al. 2018; Gálvez et al. 2019). 
The effects of yellow rust infection on gene 
expression is one of the best studied pathogen 
infections in wheat, at the transcriptional level. 
Here we will explore insights that have been 
gained using RNA-seq to study rust infection, 
which may be widely applicable to other patho- 
systems and to other environmental interactions. 
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Early studies using RNA-seq examined 
temporal changes in gene expression in wheat 
(Zhang et al. 2014), or in both wheat and the 
fungal pathogen itself revealing temporal inter- 
actions between host and pathogen (Dobon 
etal. 2016). Comparisons between suscepti- 
ble and resistant lines have also proved fruitful. 
Infection with a mixture of powdery mildew 
and leaf rust revealed that a specific set of 
genes were downregulated only in the suscep- 
tible line. These genes had functions related to 
programmed cell death and response to cel- 
lular damage, indicating that the two fungal 
pathogens evade the wheat defense system by 
inducing transcriptional level changes (Poretti 
etal. 2021). This agrees with earlier results 
which examined a time course of RNA-seq in 
wheat plants infected with yellow rust. Immune 
response regulators were rapidly upregulated 
after yellow rust infection, but this upregula- 
tion was suppressed in subsequent timepoints. 
Only in resistant interactions was this suppres- 
sion alleviated, while in susceptible reactions 
the immune response regulators continued to be 
suppressed (Dobon et al. 2016). This parallels 
the findings of Poretti et al. (2021) that specific 
suppression is required in susceptible wheat 
lines for successful infection. 

Transcriptomics studies are also now lead- 
ing to the identification and functional char- 
acterisation of genes involved in pathogen 
resistance and susceptibility. Corredor-Moreno 
etal. (2021) used data from 68 pathogen- 
infected wheat varieties to investigate genes 
which influence wheat rust susceptibility. Since 
samples were collected from different varieties, 
growth conditions and developmental stages, 
the authors clustered gene expression profiles 
to identify genes linked to yellow rust suscepti- 
bility. This reduced the amount of background 
differentially expressed genes which are not 
involved in the infection response, but instead 
are linked to variety, growth condition or devel- 
opmental stage. By focussing on clusters which 
showed strong expression differences between 
the most and least susceptible cultivars, suscep- 
tibility-associated genes were identified. These 
susceptibility-associated genes were enriched 


for branched-chain amino acid (BCAA) biosyn- 
thetic genes. Comparison with publicly avail- 
able data highlighted the gene branched-chain 
aminotransferase 1 (TaBCATI) as a candidate 
gene, which was ultimately validated as a sus- 
ceptibility gene using mutant lines. This study 
highlights a new way of identifying genes with 
roles in infection response and shows the poten- 
tial genetic variation we can find beyond the 
pairwise comparisons of lines with different sus- 
ceptibilities, which is the more routine approach. 


5.7.2 Elucidating Biological 
Mechanisms of Stress- 
Associated Genes Using 
Transcriptomics 


It is becoming increasingly routine to character- 
ise lines with phenotypic alternations in stress 
responses using RNA-seq. This can provide 
insight into the molecular pathways through 
which a gene involved in stress responses oper- 
ates and identify future breeding targets down- 
stream in the process. 

Taking drought stress as an example, several 
studies have recently associated NAC transcrip- 
tion factors with drought tolerance and stud- 
ied the pathways through which they act. The 
first NAC gene (JaSNACS&-6A) improved seed- 
ling stage drought tolerance (Mao et al. 2020). 
RNA-seq analysis in roots showed that even 
under well-watered conditions, genes with GO 
terms associated with drought, auxin and ABA 
responses were upregulated in lines overexpress- 
ing this gene. Under drought conditions, more 
genes associated with drought, auxin and ABA 
response were upregulated, in the overexpres- 
sion line than in well-watered conditions. The 
authors hypothesise that these changes enhance 
root development and increase water use effi- 
ciency, leading to increased drought tolerance. 
The second NAC (TaNACO7I-A) increased 
yield under drought conditions by increasing 
water use efficiency (Mao etal. 2022). RNA- 
seq in leaves revealed that stress-responsive 
pathways such as response to abscisic acid and 
response to osmotic stress were upregulated in 
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lines overexpressing this NAC. Furthermore, 
orthologs of well-established drought-inducible 
genes were upregulated in the overexpression 
lines including genes involved in stomatal clo- 
sure, suggesting that TaNACO71-A may increase 
drought tolerance by more quickly closing the 
stomata and reducing the transpiration rate. 
Interestingly, a separate study revealed through 
RNA-seq that increasing stomatal closure under 
drought is a common mechanism controlled by 
NAC transcription factors in wheat Ma etal. 
(2022). 


Limitations of Current 
Transcriptomic Studies 


5.8 


A common limitation in many species is that 
RNA-seq has generally been carried out on 
pooled tissue which results in the loss of a large 
amount of potential information from single 
cells or individual tissue types. For example, 
by sampling a whole leaf and grinding it up 
prior to RNA extraction, the generated expres- 
sion profiles are an average across many cell 
types. Therefore, any spatial differences expres- 
sion within a tissue cannot be observed. Until 
recently, large quantities of RNA were needed 
for RNA-seq; therefore in order to study spe- 
cific cell/tissue types, labour-intensive meth- 
ods had to be used to gather large quantities of 
material such as aleurone and endosperm from 
developing grain (Pfeifer et al. 2014) and devel- 
oping meiocytes (Martin et al. 2018). However, 
the development of low input RNA-seq meth- 
ods now allows gene expression studies with 
much reduced sample collection requirements 
and enables studies on very small tissue sam- 
ples which were not feasible before. Low input 
methods were used by Backhaus et al. (2022) 
to investigate the gene expression patterns in 
different regions of the developing spike. The 
developing spike was dissected at double ridge 
and glume primordia stage into three sections 
(apical, central, basal) for sequencing, with- 
out any pooling of different samples required. 
Surprisingly Backhaus et al. (2022) found that 
the largest differences in the transcriptome were 
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between the basal and apical sections, rather 
than between different consecutive timepoints 
of development. The discovery that position 
has a stronger effect than the developmental 
time point could not have been made by doing 
bulk-RNA-seq of the whole spike, as has been 
done by previous studies (e.g. Feng et al. 2017), 
uncovering the unique and powerful information 
available using this low input approach. 

While the ability to sequence small samples 
is a major step forwards, resolution at the single- 
cell level is now being applied in other plant 
species such as Arabidopsis (Thibivilliers et al. 
2020). However, single-cell RNA-seq (scRNA- 
seq) still has limitations including the complex- 
ity of the method itself, mainly the capture of 
single cells (Chen et al. 2019) and the risk of 
overamplification based on the small amount of 
RNA provided from a single or small number 
of cells (Hrdlickova et al. 2017). However, the 
main issue for scRNA-seq in plant transcrip- 
tomics is the need to degrade the cell wall, with 
the different compositions and types meaning 
different protocols are required (Thibivilliers 
et al. 2020). The application of scRNA-seq will 
present new opportunities for wheat research, 
and success in applying this method to mono- 
cots such as rice and maize (e.g. Xu et al. 2021; 
Zhang et al. 2021) lay the groundwork for future 
studies. 

A second key limitation of many studies to 
date has been the use of glasshouse and con- 
trolled environment conditions, to minimise var- 
iations in transcriptome changes due to factors 
other than what is being experimentally manipu- 
lated. However, this is not necessarily indica- 
tive of gene expression during development or 
responses to stress in the field environment. It is 
becoming increasingly important to understand 
gene expression in real-world fluctuating envi- 
ronments, and field-based studies are becom- 
ing more common (e.g. Quijano et al. 2015; Li 
et al. 2018; Corredor-Moreno et al. 2021). Field- 
based studies can develop increased insight 
into biological pathways and provide important 
information for breeding. For example, a field- 
based experiment revealed that multiple inter- 
active pathways that influence cold tolerance to 
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prepare for over-winter stress, and these com- 
plex interactions may have been missed in con- 
trolled environment conditions where changes 
are often abrupt (Li et al. 2018). However, varia- 
bility in gene expression caused by environmen- 
tal influence can be strong and make analysing 
changes due to a single gene difficult, as was 
found for the powdery mildew resistance allele 
Pm3b (Quijano et al. 2015). Therefore, research- 
ers will need to assess the relative benefits of the 
realistic nature of gene expression under field 
conditions against the potential pitfalls for each 
experiment. 


5.9 Constructing Gene Networks 
for Hypothesis Generation 
and Candidate Gene 


Identification 


Although comparisons of gene expression 
between samples at different timepoints or 
in different environmental conditions can be 
informative, applying network approaches to 
understand gene interactions and pathway-level 
responses to environmental and developmen- 
tal changes is a complementary and powerful 
approach. Networks can integrate a wide range 
of information from gene expression and co- 
expression through to protein-level interac- 
tions and scientific literature links (Hassani-Pak 
etal. 2016), but here we will focus on gene 
networks built mainly from gene expression 
measurements. 


5.9.1 Co-expression Networks 
Co-expression networks can be built from 
thousands of genes using the similarity in their 
expression patterns across multiple conditions to 
determine which genes are grouped (Fig. 5.3a). 
Based on “guilt-by-association” genes that 
belong to the same co-expression group are 
often considered to be co-regulated, for example 
by shared transcription factors, and to be part of 
the same biological process. 


An important application of gene co-expres- 
sion networks is the functional annotation 
of uncharacterised genes (Serin etal. 2016). 
The development of a high-quality reference 
sequence for wheat enabled the generation of 
detailed co-expression networks focussing on 
specific wheat tissues (leaf, grain, root and 
spike) and stress conditions (abiotic and biotic) 
(Ramírez-González etal. 2018). A comparison 
of the four tissue-specific networks revealed 
modules of genes which were uniquely co- 
expressed in the root including several genes 
whose orthologs regulate root development in 
Arabidopsis. The other genes present in these 
root-specific modules represent novel genes 
that according to “guilt-by-association” may 
play roles in root development. Additional stud- 
les have used co-expression networks to iden- 
tify candidate genes involved in meiosis, grain 
development and flowering time pathways 
(IWGSC et al. 2018; Alabdullah et al. 2019; Chi 
et al. 2019). 

While these studies showed the potential of 
co-expression networks to identify candidate 
genes associated with a biological process of 
interest, functional validation of newly identi- 
fied genes was lacking. The value of these pre- 
dictions has been illustrated in wheat using the 
disease-related network generated by Ramírez- 
González etal. (2018). Polturak etal. (2022) 
revealed that the top pathogen-induced mod- 
ules contained multiple clusters of physically 
adjacent genes that correspond to six pathogen- 
induced biosynthetic pathways. Heterologous 
expression of these co-expressed genes in 
Nicotiana benthamiana produced flavonoids and 
terpenes that may play a role in defence signal- 
ling or as phytoalexins. This study shows the 
power of co-expression to assign functions to 
previously uncharacterised genes. 

Several online tools have been developed 
which allow wheat researchers to identify genes 
that are co-expressed. WheatOmics allows users 
to search for genes co-expressed with a gene of 
interest in either grain or multi-tissue co-expres- 
sion networks (Ma et al. 2021) and KnetMiner 
integrates information about co-expression 
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Fig.5.3 Graphical representation of gene networks. 
a Gene co-expression networks group genes with simi- 
lar expression patterns across multiple conditions. 
Interactions between genes (circles) can be direct or indi- 
rect. b Gene regulatory networks represent direct interac- 
tions between genes with directionality. In the example 


from a network built using 850 wheat RNA-seq 
samples with a meiosis-specific co-expression, 
network (IWGSC etal. 2018; Alabdullah et al. 
2019; Hassani-Pak etal. 2021). Online tools 
are also available to construct co-expression 
networks using custom datasets, such as unpub- 
lished RNA-seq data including CoExpNetViz 
(Tzfadia etal. 2016) and Gene Network 
Construction Tool Kit (GeNeCK) (Zhang et al. 
2019). 
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here, a transcription factor (TF; yellow pentagon) is 
expressed earlier in time and binds to the promoter sites 
of two downstream genes (blue); the regulatory network 
on the right shows the directionality of these interactions 
(arrowheads) 


5.9.2 Gene Regulatory Networks 


In contrast to co-expression networks, the links 
within gene regulatory networks (GRNSs) rep- 
resent direct gene interactions rather than the 
association of expression patterns (Fig. 5.3b). 
GRNs can be built using transcriptome data 
alone, or they can incorporate additional data 
types for transcription factor-DNA interactions 
which inform the network structure (reviewed in 


5 The Wheat Transcriptome and Discovery of Functional Gene Networks 87 


Ko and Brandizzi 2020). GRNSs typically have a 
scale-free network architecture with a few hub 
genes with multiple connections to other genes 
and many poorly connected nodes (Barabasi 
and Oltvai 2004). The hub genes act as master 
regulators of a GRN and play important roles in 
biological systems and therefore identifying and 
manipulating hub genes may enable the manipu- 
lation of a biological process of interest. 

GRNs in wheat have been used to generate 
hypotheses about gene function and to iden- 
tify hub genes which have a strong influence 
on a biological process. A large GRN was built 
using 850 RNA-seq samples to predict transcrip- 
tion factor-target interactions using the machine 
learning-based GENIE3 algorithm (Huynh-Thu 
etal. 2010). To test the validity of the tran- 
scription factor targets identified by GENIE3, 
Harrington etal. (2020) compared the target 
genes of the senescence-regulating transcription 
factor NAM-A/ to genes differentially expressed 
in nam-al mutant lines compared to wild type. 
The NAM-A/ target genes predicted by GENIE3 
overlapped considerably with the differentially 
expressed genes in lines with reduced NAM-A7 
expression, indicating that GENIE3 can provide 
biologically relevant predictions. Furthermore, 
additional senescence-associated transcription 
factors were identified by combining GENIE3 
target information with independent senescence- 
related expression data. Similarly, combining 
the GENIE3 network with co-expression net- 
works enabled the identification of candidate 
genes involved in root development and stress 
responses (Ramírez-González et al. 2018). 

While the GENIE3 approach relies upon 
diverse RNA-seq samples from different tissues 
and conditions, GRNs have also proved valu- 
able to understand developmental timeseries in 
wheat. A ten-timepoint time course of flag leaf 
senescence was sampled and the resulting RNA- 
seq data was used to construct a GRN using the 
time-aware causal structure inference algorithm 
(Penfold and Wild 2011; Borrill etal. 2019). 
Filtering the GRN for highly connected and 
central hub genes identified known senescence 
regulator NAM-A/ amongst the 36 top-ranked 


genes, indicating that this approach identified 
biologically relevant genes. Functional valida- 
tion of NAM-A2, another top-ranked gene and an 
uncharacterised paralog of NAM-A/, showed the 
power of this approach to identify genes regulat- 
ing senescence. 


5.9.3 Limitations of Gene Networks 


The first attempts to use gene networks in 
wheat have focussed on hypothesis genera- 
tion and identifying candidate genes involved 
in a biological process of interest. While use- 
ful insights have been gained, there is still more 
work to be done to fully leverage the power of 
gene networks. To date, most gene networks in 
wheat have been built using gene expression 
data, although some other types of information 
are incorporated into tools such as Knetminer 
and inetbio (Lee et al. 2017; Hassani-Pak et al. 
2021). In other species, the accuracy of net- 
works has been improved by incorporating addi- 
tional data sources such as transcription factor 
binding sites, open chromatin regions and pro- 
tein-protein interactions (reviewed in Haque 
et al. 2019; Ko and Brandizzi 2020). In wheat, 
these types of data are becoming available, for 
example with the publication of accessible chro- 
matin regions identified by ATAC-seq (Concia 
et al. 2020) and this information could be incor- 
porated into future networks to improve the pre- 
dictive ability. 

A second challenge is the validation of gene 
networks in wheat. In model systems com- 
parison to “gold standard" networks allows 
the accuracy of different network construction 
methods to be determined (Marbach et al. 2012). 
However, in wheat, we know little about the true 
topology of gene networks so validation using 
this approach is not possible. Instead, network 
predictions can be validated on an individual 
gene basis by examining mutant or gene-edited 
lines for predicted phenotypic effects (Borrill 
etal. 2019). Alternatively, gene interactions 
in the network could be tested using molecu- 
lar biology approaches. Another promising 
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approach is to integrate several different net- 
work construction approaches which can boost 
the breadth and accuracy of gene interactions in 
biological networks (Marbach et al. 2012). 

A final issue which affects wheat gene net- 
works is that having a large polyploid genome 
with>110,000 genes presents practical chal- 
lenges for some GRN construction techniques. 
Although co-expression can be carried out on 
thousands of genes simultaneously (e.g. IWGSC 
etal. 2018; Ramírez-González etal. 2018), 
some widely used GRN approaches only permit 
tens to hundreds of genes due to computational 
constraints. One method to circumvent this 
limitation is to filter genes likely to be of inter- 
est before entering them into the GRN to reduce 
the number of genes (e.g. Borrill et al. 2019). 
Alternatively, some algorithms such as GENIE3 
can use tens of thousands of genes as input, 
although the computational steps take several 
weeks on a high-performance computing cluster, 
therefore this approach will not be accessible to 
all. 


5.10 Conclusions and Future 
Outlook 


The use of transcriptomics has greatly increased 
in wheat over the past few years, benefitting 
from a high-quality genome annotation and 
decreasing sequencing costs. Accurate gene 
models now simplify the analysis of transcrip- 
tomic data and increase the value of the bio- 
logical information gained. While traditional 
studies have focussed on understanding changes 
in gene expression in response to environmen- 
tal stresses or developmental changes, there are 
an increasingly varied applications of RNA-seq 
from identifying candidate genes by surveying 
genetically diverse populations through to build- 
ing gene regulatory networks for hypothesis 
generation. Rapid developments in technolo- 
gies for transcriptomics will enable us to deepen 
our understanding of wheat biology for exam- 
ple uncovering high-resolution gene expression 
patterns. 
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Abstract 


Common wheat is a hexaploid species crop 
that is widely recognized as an important 
staple food crop. The establishment of a 
gold standard reference genome sequences 
of the well-studied CHINESE SPRING, 
and its progenitors (including Triticum tur- 
gidum ssp. dicoccoides accession Zavitan, 
Triticum durum accession Svevo, Triticum 
urartu, Aegilops tauschii), in the last 5 years 
has dramatically promoted our understand- 
ing of wheat genome diversity and evolu- 
tion through the resequencing of collections 
of wheat and its progenitors. In this chap- 
ter, we review progress in the analysis and 
interpretation of genome-based studies of 
wheat focusing on geographic genome dif- 
ferentiation, interspecies gene flow, haplo- 
type blocks, and gene diversity in breeding. 
We also consider approaches for efficiently 
discovering and integrating the genes and 
genome variations, hidden in Genebank col- 
lections, into wheat breeding programs. 
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Wheat Origin and Spread I 
in the World 


6.1 


Common wheat (Triticum aestivum L.) provides 
approximately 20% of the total calories for 
human intake globally. The origins of the com- 
mon hexaploid wheat species were through nat- 
ural crosses between cultivated emmer (Triticum 
turgidum, AABB) and Aegilops tauschii (DD) 
and is considered to be the first domesticated 
crop in the “hilly flanks of the Fertile Crescent” 
in southwestern Asia between 10.000 and 7000 
BC (Feldman and Levy 2012). Key advances for 
the domestication process included the absence 
of head brittleness and free-threshing grains. 
The dispersal of wheat selections prior to the 
5th millennium BC was extensive as several 
Triticum taxa spread from the Fertile Crescent 
westwards across central Europe and along 
the northern coastal line of the Mediterranean 
(Fig. 6.1). To the East, wheat is documented 
in archaeological records to be present in 
Turkmenistan and Pakistan before 5000 BC. It 
was introduced into west China in 2000 BC and 
into central and east China in approximately 
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Fig.6.1 Map 


showing hypothesized dispersals of 
domesticated Triticum and Hordeum taxa (i.e. wheat and 
barley) originated in southwestern Asia across the Old 
World dating between 5000 and 1500 cal BC. Black cir- 
cles: sites older than 5000 BC; gray circles: sites dated 
between 5000 and 2500 BC; white circles: sites dated 


1500 BC (Liu et al. 2016, 2019a, b), based on 
archaeological discoveries. 

Colonization of wheat in the very new and 
distinct environments eventually replaced 
native crops as the staple crop and resulted in 
field-level selections of traits with very strong 
geographic characters to meet the local cultiva- 
tion and consumption of variant human popu- 
lations. These genetic changes were basically 
retained in the genome variation between cul- 
tivars, especially the landraces. Establishment 
of the so-called gold standard wheat genome 
sequence, taken together with the assemblies 
of the reference genomes of its progenitor spe- 
cies as well as other hexaploid varieties (Avni 
etal. 2017; Luo etal. 2017; Zhao et al. 2017; 
Ling et al. 2018; IWGSC 2018; Zhu and Luo 
2021), has provided the basis for high-density 
SNP-chips and resequencing analyses. Advances 
such as the production of SNP-chips with 90, 


between 2500 and 1500 BC; solid line: parsimonious 
inference from botanical evidence from dated archaeo- 
logical context (the density of which varies greatly across 
Eurasia). Map is originally presented in Liu et al. (2019a, 
b), modified with permission 


285, 660 K SNPs have provided the means for 
the wide use to elucidate genome diversity. 
Germplasm exchange and the development of 
genomics now provide a new opportunity to re- 
evaluate and reconsider the evolution and dis- 
persal process of wheat from this new point of 
view. These works also pave the way for asso- 
ciating the allelic variation with phenotypes for 
physical mapping of variation in the genome 
(Varshney et al. 2021). 


Global Distribution 

of Wheat Genome Diversity 
and the Leading Role of 3B 

in Geographic Differentiation 


6.2 


The genotyping of 632 world wheat landraces 
using the 285 K SNP array-markers on chro- 
mosome 3B, allowed Paux etal. (2008) to 
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Fig. 6.2 Haplotypes in landraces on 3B and their global 
distribution (provided by Dr. Etienne Paux, INRAE). 
The red, pink, blue, and green dots refer to different 


define the very strong geographic differentia- 
tion (Fig. 6.2). In a follow-up diversity analysis 
of Chinese wheat landraces using a 660 K SNP 
array, we found they could be basically classi- 
fied into two sub-groups, the north-China sub- 
group and middle-south China sub-group (Wang 
etal. 2021). Among the 21 chromosomes, 3B 
and 7A were particularly prominent in being 
associated with the stratified domestication in 
China, based on the standard Fst values for SNP 
allele frequencies that differentiate populations 
in two groups, namely Triticum aestivum-L1 and 
T. aestivum-L2 (Fig. 6.3). 

When the differentiation of populations T. 
aestivum-L1 and T. aestivum-L2 were narrowed 
down to the analysis of the crucial regions of 
280-375 Mb on 3B and 211.7-272.9 Mb on 7A 
in the CS reference 1.0 (IWGSC 2018), the Fst 
reached 0.84 and 0.66, respectively (Fig. 6.3a; 
quantified in Fig.6.3b), and were associated 
with grain size and length in multi-environ- 
ment BLUP phenotype data (Wang et al. 2021). 
Accessions in T. aestivum-L1 were mainly dis- 
tributed in northwestern China, whereas those 
in T. aestivum-L2 were mainly from central 
to eastern China (Fig. 6.4). The most distinct 


haplotypes (see Paux etal. 2008) and the clustering of 
the different colored haplotypes across the landscape 
from Europe to China is evident 


agronomic trait was grain size (TKW), i.e., the 
T. aestivum-L2 accessions usually had smaller 
grain size than the T. aestivum-L1 accessions, 
which was achieved by reduction in grain length 
(Wang et al. 2021). 

Haplotype analysis in genotyped collections 
including wild emmer, domesticated emmer, 
common wheat landraces, and Chinese modern 
cultivars based on the 660 K SNP genotyped 
data clearly revealed wild emmer (WE) was the 
donor for the hap-block in L1 (see Haps 1, 2, 4, 
and 12 in Fig. 6.4). This is consistent with the 
suggested intercross and genome introgression 
between common wheat and wild emmer (He 
et al. 2019; Cheng et al. 2019). 

GWAS based on the multi-year agronomic 
trait phenotypes revealed strong association of the 
crucial region on 3B (280-375 Mb) with spike 
length (— log,, (p) 5.0). In Chinese landraces, 
the northwest haplotype-group (L1) usually has 
longer spike and larger grains than the southeast 
haplotype-group (L2). Breeding selection in the 
seven decades from 1950 to 2020 favored the 
Ll-haplotypes from the wild emmer (Fig. 6.4; 
Hao etal. 2020; Wang etal. 2021). Therefore, 
we estimate that this genomic region might also 
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Fig.6.3 Very strong geographic and genetic differen- 
tiation happened in Chinese wheat landraces, forming 
two subsets, L1 (blue) and L2 (red). The 3B and 7A lead 
the differentiation among the 21 chromosomes. a Quite 
distinct distribution of collections in L1 and L2. b The 
F,r value between Lland L2 along the 21 chromosomes, 
which was estimated based on the 660 K SNP mark- 
ers using CS R 1.0 as reference. c and d. The triangles/ 


arrows indicate Fç, values between L1 (blue) and L2 
(red), L1 and modern cultivars (M, green), and L2 and M 
on 3B and 7A. The red lettering along the arrows focuses 
on the crucial genomic regions (3B: 280-375 Mb) and 
(7A: 211.7-272.9 Mb). The data along the dashed circles 
were F,, values between subsets in whole genome of 
wheat (adapted from Wang et al. 2021) 
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Fig.6.4 Haplotype network based on SNPs on 3B 
in cultivated emmer (CE), domesticated emmer (DE), 
Northeast landrace group 1 (L1), Southeast landrace 
group 2 (L2), modern Chinese cultivar (MC), and wild 


relate to NUE or WUE of cultivars. The great 
increase of L1-haplotypes (including Haps 1, 2, 
4, and 12) in the modern Chinese cultivars (MC) 
at this genome location also correlated with the 
cooking style from full grain in history to wheat 
flour products today, because small grain was 
favored in full grain cooking, but larger grain was 
favored in flour-product consumption because of 
higher yield (Liu et al. 2014, 2016). Based on the 
analysis in the 10+pangenome, we found large 
structure variation (SV) existing in this region 
across the 3B centromere (Fig. 6.5). 


6.3 Frequent Gene Flow Between 
Species and Its Effects 


on Diversity 


In Israel, wild emmer wheat with intermediate 
phenotypes grew at the boundaries of cultivated 
areas. These wild plants may have originated 


Hap4 


emmer (WE). Circle size is proportional to the number of 
accessions for a given haplotype. The short lines between 
two nodes indicate the number of mutations 


from hybridization of wild emmer with T. turgi- 
dum cultivars. They are indicative of gene flow 
between wild and domesticated populations 
(Matsuoka 2011). Dvorak etal. (2006) pro- 
vided initial molecular evidence for existence 
of introgressions from wild emmer (Triticum 
dicoccoides) into common wheat, which was 
indirectly supported by the fact that wild emmer 
usually existed as an accompanying weed of 
durum and common wheat in wheat origin/ 
domestication regions. Hexaploid and tetra- 
ploid wheats were also cultivated as a mixture 
in the field in these regions (Matsuoka 2011). 
The overall consequence was that the mixed 
cropping provided opportunities for gene flow 
between species through natural hybridization. 
The identity score (IS) is widely used to 
reveal the parent's genetic contribution to their 
derived cultivars in breeding. The IS is defined 
with reference to similar nucleotide sequences 
present in two, or more than two, individuals 
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* 2014 * 2015 + 2016 + 2017 : 2018 208 


BPP 


Fig.6.5 GWAS based on 660 K SNP array with multi- 
year phenotyping data of landraces (NP) and biparental 
population (BPP) indicated that high genome differen- 
tiation at 280-375 Mb was associated with spike length 


through replication of the same ancestral copy 
of respective sequences. Our IS graph file anal- 
ysis based on resequencing analysis in com- 
mon wheat (landraces and modern cultivars), 
wild emmer, domesticated emmer and durum 
wheat, revealed frequent genomic introgres- 
sions between common wheat and wild emmer, 
as well as cultivated emmer, where CS 1.0 was 
used as reference (light blue, IWGSC 2018). Of 
course, more frequent introgressions between 
the tetraploid species were detected as expected, 
because they shared the common genome 
AABB (Fig. 6.6a). Independent research by 
two other groups also revealed the wild exist- 
ence of introgressions from wild emmer into 
common wheat (Fig. 6.6b) (Cheng etal. 2019; 
He et al. 2019). In addition, global wheat diver- 
sity research was also strongly promoted by 
the establishment golden standard reference 
genomes of common wheat and T. dicoccoides 
and Triticum durum (IWGSC 2018; Avni et al. 
2017; Maccaferri et al. 2019; Pont et al. 2019; 
Sansaloni et al. 2020), all of which were sequen- 
tially perfected with the integration of more 
assembles based on 3rd generation sequence 
reads (Zhu et al. 2021). 

Alien introgression usually reduces the 
recombination frequency and leading to strong 
LD in natural or breeding population, which 
results in decline of diversity in the respec- 
tive genome regions. However, the SNP den- 
sity was usually increased because suppression 
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NP 


on 3B chromosome as indicated by the scores for Fst 
exceeding the significance cutoff, across the 280-375 Mb 
region 


of recombination has retained the intact of the 
introgression fragments, which retain regions 
rich SNPs in their comparison with CS reference 
genome (Fig. 6.7). We found that the evenness 
of recombination rate along the D sub-genome 
chromosomes is much better than either A- or 
B- sub-genome chromosomes. This might be 
caused by the introgressions from wild emmer, 
which mainly existed within the A- and B-sub- 
genomes of common wheat (Fig. 6.8). Sufficient 
genome differentiation should happen between 
the hexaploid and tetraploid AB genome, which 
prevents occurrence of recombination between 
the "introgressions" and original homology frag- 
ments. The great difference on recombination 
rate across the centromeres between 7A and 
7B in Chinese modern cultivars directly sup- 
ported our hypothesis of introgression suppres- 
sion to recombination, because there is a large 
introgression detected on 7A across centromeric 
region (Fig. 6.8, ca 230-430 Mb, Cheng et al. 
2019; Hao et al. 2020). 


6.4 LD and Haplotype Blocks 
in Wheat Evolution 


and Breeding 


Linkage disequilibrium (LD) is a common phe- 
nomenon in the population genetics analysis 
of crops. For a long time, it was believed that 
strong positive selection for a gene usually led 
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Fig. 6.6 Frequent genome introgressions between spe- 
cies in Triticum genus revealed by 1-IBD within the 
A sub-genome chromosomes, where CS 1.0 was used 
as reference and expressed in light blue. a Graph based 
on (1-IBD) indicated frequent introgressions from wild 
emmer to domesticated emmer and common wheat. It 


also revealed reverse introgression from common wheat 
to domesticated emmer and wild emmer. b Genomic 
introgressions detected in global common wheat on the 
14 chromosomes of A and B sub-genomes from four 
wild emmer populations into common wheat (adapted 
from Cheng et al. 2019) 
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Fig.6.7 A wild emmer introgression (~ 172-448 Mb) neighbor region without introgression (468—530 Mb) on 
and their effects on SNP density, recombination ratio chromosome 4A. CL: Chinese landrace, IMC: introduced 
(o), and genome diversity (z) in comparison with the modern cultivars 
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Fig. 6.8 Recombination ratio (o) along the three chromosomes of homologous group 7 in Chinese wheat landraces 
(blue) and modern cultivars (red). There is more recombination disequilibrium on 7A and 7B than on 7D chromosome 
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to strong LD around the loci because of hitch- 
hiking effect. LD was usually affected by popu- 
lation diversity, selection pressure at the crucial 
loci, as well as recombination rate. We found 
that chromosomes in A and B sub-genomes have 
larger and stronger LD blocks in wheat (Hao 
et al. 2020). This might be caused by two fac- 
tors (1) more QTLs controlling agronomic traits 
on the A and B sub-genomes (Peng et al. 2003, 
2011). (2) Partial suppression of recombina- 
tion along A and B sub-genome chromosomes 
caused by the introgressions from wild T: turgi- 
dum species (Figs. 6.6, 6.7 and 6.8). 


6.4.1 Haplotype Block Size Along 


a Chromosome in Wheat 


On each of the 21 chromosomes, five chromo- 
somal regions were defined by the IWGSC, 
based on the overall recombination pattern 
observed in wheat (IWGSC 2018). There are 
fewer but very large (> 10 kb, Fig. 6.9a example 
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Fig.6.9 Size difference of haplotypic blocks along 
wheat chromosomes using 2B as an example (adapted 
from Balfourier et al. 2019). The designations for the 
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for chromosome 2B) haplotype blocks at regions 
across the centromeres, and smaller haplotype 
blocks at the R1 and R2 regions (Fig. 6.9). The 
identification of the RI-R3 blocks of chromo- 
some regions in the wheat chromosomes is 
based on the recombination rate characteristics, 
gene density, and tissue-specific vs household 
expression variation across each of the 21 wheat 
chromosomes. The R1 and R3 designate the 
distal ends of the short and long chromosomal 
arms, respectively; R2a and R2b designate the 
interstitial regions of the short and long arms 
and the C region and identify the pericentro- 
meric regions (IWGSC 2018). 

The box plots in Fig. 6.9 provide a statisti- 
cal assessment of the R1, R2a, R2b, and R3 
designations across the wheat genome based on 
recombination frequency and indicated that the 
difference between C and the terminal blocks 
was significant. Consistent with this significant 
difference in recombination frequency, Jordan 
et al. (2020) found that DNS scores assessing 
DNA accessibility to Micrococcal Nuclease 
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different chromosome regions in the upper panel derive 
from the overall recombination patterns observed in 
wheat (IWGSC 2018) 
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(MNase), and thus the more open or compacted 
state of the chromatin, were significantly higher 
(= more open chromatin) for the genome space 
in the R1 and R3 regions. 


6.5 X Large Haplotype Blocks 


and Their Role in Breeding 


Identification of haplotype blocks and big PAVs 
and tracking their variants in evolution. and 
breeding are notable aspects in self-pollination 
crops in the current genomics era. The inves- 
tigation of gene-network contributions to the 
well-studied thousand grain weight (TGW) phe- 
notype that contributes to yield in wheat, for 
example, provided an unexpected influence of 
structural variation for the presence/absence of 
the 5AS chromosome arm (Taagen et al. 2021). 
A combination of transcriptome data and high- 
resolution marker maps for the TGW QTL ini- 
tially thought to be on 5AL, in fact indicated 
that the QTL resulted from linkage to the pres- 
ence/absence of the 5AS arm. On a larger scale, 
the resequencing of 145 land marker cultivars 
in China, it was found that there were more 
long-range haplotypes on A and B sub-genomes 
rather than on D sub-genome in common wheat 
(Jordan et al. 2015; Hao et al. 2020). The first 
reason was that the gene flow occurred from the 
wild tetraploid T. dicoccoides during early co- 
cultivation of tetraploid and hexaploid wheat, 
where wild emmer was also present as a weed 
in wheat fields. This was therefore expected to 
result in a substantial increase in polymorphism 
on the A and B sub-genomes relative to the D 
sub-genome in modern bread wheat. This also 
partially, negatively, affected the homologous 
recombination occurrence within A and B sub- 
genomes at crucial genomic regions because of 
differentiation on the intergenic repeats among 
wild emmer, cultivated emmer, and common 
wheat, leading to a very uneven distribution of 
recombination ratio and SNPs along chromo- 
somes (Fig. 6.8). The second reason is asym- 
metric distribution of agronomic traits among 
the three sub-genomes. There are more QTLs or 
genes controlling domestication and yield traits 
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mapped on the A sub-genome than either on B 
or D sub-genomes, leading to stronger selection 
on the A sub-genomes (Peng et al. 2011; Jordan 
et al. 2015). 

Haplotype-based breeding (HBB) can now 
be proposed following the genome resequencing 
of larger number of cultivars, because it repre- 
sents a promising breeding approach for deal- 
ing with and identifying, superior haplotypes 
and their deployment in breeding programs 
(Varshney et al. 2021). We propose that for self- 
pollinated crops with a long breeding history, it 
will be possible to take advantage of hap-block 
identification to select ideal parent materials to 
achieve new high-performing cultivars via HBB 
(Figs. 6.9 and 6.10). 

We dissected diversity features along chro- 
mosomes 6A (Fig. 6.10a) and 1A (Fig. 6.10b) 
in cultivar subsets released pre- and post-devel- 
opment of the two hallmark Chinese cultivars 
AIMENGNIU (AMN) and XIAOYAN 6 (XY6) 
based on their pedigrees. Fixation of big hap- 
lotype blocks from 224 to 442 Mb, on 6A in 
post-XY6 cultivars were detected but relatively 
higher diversity was retained in AMN-post cul- 
tivars. From 100 to 300 Mb, the haplotype block 
was fixed in post-AMN cultivars but not in post- 
XY6 on chromosome 1A. This indicated the 
haplotype block carried by XY6 on 6A and that 
carried by AMN on 1A provided sufficiently 
high-quality attributes for breeders to then retain 
them. An interesting but less pronounced trend 
was also found from 178 to 472 Mb on chromo- 
some 2A, but with both XY6- and AMN-derived 
new cultivars, this genomic region maintained a 
higher diversity. This indicates haplotypes car- 
ried by either XY6 or AMN are not sufficiently 
high-quality enough for breeders to retain them. 
The very large sizes of the haplotype blocks also 
highlight the feasibility of HBB in wheat. 


6.6 Human Selection and Gene 


Diversity 


Cloning the gene responsible of a trait or QTL 
and analyzing its natural variation to find valu- 
able new alleles is one major task for scientists 
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Fig.6.10 Diversity features at key haplotype blocks 
considering cultivars subsets released pre- and post- 
development of the two hallmark Chinese cultivars 
AIMENGNIU (AMN) and XIAOYAN 6 (XY6) as well 
as within Chinese landraces (CL) and modern Chinese 


working in crop genetic resources. Fine map- 
ping of QTL through advanced backcross 
QTL analysis was regarded as the best reliable 
method for a long time (Tanksley and Nelson 
1996). However, QTL mapping-based clon- 
ing of genes in wheat is very time-consuming 
because of the complexity of genome and 
polyploid nature. The gold reference genome 
sequence promotes gene cloning via GWAS 
under the assistance of gene editing and trans- 
formation in wheat. The successful mapping of 
Rht24 through GWAS in large collections using 
the CHINESE SPRING genome as reference 
for 6A is a landmark indicator for gene map- 
ping strategy that complements the QTL fine 
mapping in biparental recombination population 


cultivars (MCC). SNP similarity between AMN and 
XY6, population diversity for each sub-set (pre-AMN 
vs. post-AMN, pre-XY6 vs. post-XY6, MCC vs. CL) on 
chromosomes 6A (a), 1A (b). Adapted from Hao et al. 
(2020) 


to GWAS-fine mapping in natural population. 
Through anchoring the flanking markers on 
the RefSec v1.0, the candidate gene of Rht24 
was narrowed down to 50 Mb region between 
400 and 450 Mb on 6A chromosome, which 
was actively selected in breeding since 1990s 
(Wiirschum et al. 2017). 

It is very hard to precisely map genes at peri- 
centromeric region through recombination in 
biparental populations. But through GWAS, we 
can use long historic recombinations to carry 
out mapping and dissection of the crucial region. 
For example, we found a grain thickness-associ- 
ated locus on the long arm of the chromosome 
5A marked by the peak SNP chr5A_430246395 
(— log,,(p)=4.17) because the region was 
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Fig. 6.11 GWAS make it possible to precisely map and 
verify genes in the low recombination region using CS 
golden reference and historic recombinations under the 
assistance of genome re-sequence and gene editing. A 
grain thickness locus on chromosome 54A carrying the 
rice DENSE AND ERECT PANICLE ortholog TaDEPI. 
a GWAS signals at 430.24 Mb on 5A. b Three homolo- 
gous of DEPI and their mutated sites by CRISPR-Cas9 


overlapping with selection sweeps and con- 
tained the wheat homolog of the rice DEPI gene 
(Fig. 6.11) that has been shown in rice to enhance 
grain yield by promoting nitrogen utilization effi- 
ciency (Huang et al. 2009; Xu et al. 2019). The 
LD block was- 1.3 Mb and contained 15 genes. 
A total of 33 SNPs were present in the region. 
Haplotype analysis of these SNPs showed that 
the grain thickness of accessions with haplotype 
1 (Hapl) was significantly thicker (P«0.001) 
than that of other accessions (Fig.6.11), and 
these two sets of accessions also had signifi- 
cant increases in average thousand grain weight, 
but reduced plant height. The locus in fact 
shows pleiotropic effects on multiple agronomic 
traits and RNA-seq data showed that TaDEPI 
expressed significantly different  (P«0.001) 
between accessions of thin-grain and thick-grain. 
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DEP1-WT 


DEP1-MUT 


in KELONG 199. c Seed size difference between the 
triplet mutant (DEPI-MUT) and wild type (WT). d 
Statistics difference between the DEPI-MUT and WT 
on flowering time, grain length (GL), grain wide (GW), 
and grain thickness (GT). e and f phenotype difference 
on plant morphology and spike. Adapted from Li et al. 
(2022) 


We then used CRISPR/Cas9 editing to introduce 
deletion mutations into all three ZaDEP/ homoe- 
ologs in cv. KENONG199. The edited plants dis- 
played significant (P«0.01) reductions in grain 
size, the edited mutants also showed short stem, 
more tillers, and compact spike (Fig. 6.11), con- 
firming that TaDEP1 is a gene with pleiotropic 
effects and functionally essential for wheat grain 
size development (Li et al. 2022). 

There are more PAV and other SV in com- 
mon wheat than other crops. Therefore, if the 
agronomic target is located in the SV region, 
there are likely to be difficulties to map QTL 
precisely using biparent population, even in the 
natural population by GWAS using a single ref- 
erence genome. A graph of the pangenome for 
functional genomics and HBB in wheat would 
be a major advance. 
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67 YieldGenesand Their Diversity 
For yield genes, because of their conserved 
characters among cereals, much work was car- 
ried out based on the synteny and collinear- 
ity among cereal genomes, especially the good 
collinearity between wheat and rice. Three 
very interesting points were found. The first is 
dominance of the three homologous genes in 
the hexaploid species. The second is that most 
of the natural variations occurred within the 
promotor regions of the crucial genes among 
cultivars. The third is strong correlation of hap- 
lotypes with the water and fertility of the soil as 
well as sunlight and temperature resources in 
growing season. For example, the GS5 was rec- 
ognized as one gene strongly influencing grain 
size in cereals (Li et al. 2011), and in wheat, it 
is preferentially expressing in young spikes and 
developing grains, and positively regulating 
grain size. Among the three homoelogous genes, 
GS5-3D has the dominant expression, GS5-3B 
is almost silenced with very low expression 
level, while the GS5-3A is seen to have medium 
expression levels (Fig. 6.12a). Only one SNP 
(T/G) was identified at 2334 bp downstream 
of the ATG start codon at the TaGS5-3A. Two 
alleles were detected on GS5-3A in world mod- 
ern cultivars, with average 6-7 g difference on 
thousand grain weight. The global distribution 
of frequency of larger grain size allele TaGS5- 
3A-T exhibited very good correlation with 
water resources during wheat growing season 
(Fig. 6.12b). No diversity was detected at either 
3B or 3D loci (Ma et al. 2016). 


6.8 Adaptation of Cultivars 


to Environments 


Based on the whether or not a cold temperature 
vernalization is required to promote flowering, 
wheat cultivars are classified into winter and 
spring types. This vernalization requirement 
prevents temperate plants from flowering under 
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freezing winter conditions. Wheat cultivars 
grown in different environments need diverse 
vernalization characteristics to ensure flower- 
ing and reproductive development at appropri- 
ate time to meet the need for higher yield and 
mature on time. 

In wheat, flowering time is controlled by 
both vernalization system and photoperiod reac- 
tion system together. For the vernalization, 
there are four genes, TaVrnl, TaVrn2, TaVrn3, 
and TaVrn4, that have been positionally cloned; 
TaVrn4 was identified as a duplication of TaVrn1 
(Yan et al. 2003, 2004, 2006; Kippes et al. 2015). 
The expression level of TaVrn3 (FT) is the key 
element determining flowering or not flower- 
ing. However, expression of TaVrn3 is strongly, 
positively, regulated by TaVrn1 and TaVrn4 and 
PPDI, but negatively regulated by TaVrn2. Any 
function mutants in ZaVrnl, TaVrn2, TaVrn4, 
and PPDI influence expression of TaVrn3, and 
subsequently the flowering time. This provides 
wheat with an extensive range of variation to 
adapt particular combinations of variants to grow 
environments through combining different alleles 
at the four loci. Mutations at promoter region 
of VRN3 that result in a loss of binding site for 
VRN2 lead to complete loss of suppression of 
VRN3 by VRN2 and result in a full spring type 
wheat (Yan et al. 2003, 2004, 2006; Kippes et al. 
2015). Furthermore, it was found 7aVrnl had 
significant epistatic effects on flowering time 
(Xie et al. 2021). Copy number variation (CNV) 
was also detected at VRNI loci, which nega- 
tively influences expression level of itself (Diaz 
etal. 2012). In addition, TFs binding with cis 
at promoter regions of VRN/, VRN2, and VRN3 
often affecting wheat heading and flowering time 
(Liu etal., JIBP 20192). Furthermore, genes in 
the pathway of auxin were also involved in the 
regulation of leaf senescence and re-mobility of 
nutrients from leave and stems to grains in wheat 
(Li et al. 2023). A detailed summary of the ver- 
nalization system and photoperiod reaction net- 
works in wheat is provided by Sehgal et al. (see 
Chap. 11 in this volume). 
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Fig. 6.12 Dominance among the three GS5 homeol- 
ogy genes in wheat (a) and global distribution of alleles 
in modern cultivars (b). a Temporal and spatial expres- 
sion of TaGS5 homoeologues. SL, seedling leaf; SR, 
seedling root; HR, root at the heading stage; HS, stem at 
the heading stage; FL, flag leaf; 1 YS, 3 YS, 5 YS, and 7 
YS, young spikes of 1, 3, 5, and 7 cm in length; SP, spike 
at heading stage; various stages of grain development, 
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—— TaGS5-3A 
— TaGS5-3B 


— TaGS3-3D 


including 1 DPA, 3 DPA, 5 DPA, 10 DPA, 15 DPA, 
20 DPA, and 25 DPA. The expression of TaGS5-3A 
in the spike at heading stage was assumed to be 1. b 
Distributions of TaGS5-3A-T and TaGS5-3A-G alleles in 
wheat cultivars from different ecological regions. North 
America (D, CIMMYT (ID, Europe (III); former USSR 
(IV); China (V); and Australia (VI) 
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6.9 Disease-Resistant Genes 


and Their Diversity 


In one life cycle, wheat is threatened by many 
diseases and pests. Some pathogens’ races, such 
as rusts, powdery mildew change very quickly 
from year to year. Therefore, wheat disease 
resistance breeding is a constant evolutionary 
arms race with their pathogens. Therefore, there 
must be enough diversity in R genes for this 
race. Fortunately, disease-resistant genes usually 
mapped at the high recombination regions (R1 
and R3, Fig. 6.9) on chromosomes. Frequent 
recombinations often create new variation and 
PAV and CNV, which bring great opportunity to 
create new genes for resistance. Therefore, clon- 
ing R genes has been a high priority in wheat 
molecular biology in the past 10 years and is 
expected to continue to be a high priority. 

Until now, there are three major types of 
disease resistance genes cloned in wheat (see 
also Chap. 10). Resistance genes with typi- 
cal CC-NBS-LRR domains, such as pow- 
dery mildew resistance genes Pml, Pm2, and 
Pm3, leaf rust resistance genes Lr] and Lr13, 
stem rust resistance genes Sr33 and Sr35. 2) 
R genes containing Kinase-MCTP structure, 
such as the Yr36, has a START-Kinase struc- 
ture. The Pm4 has a Kinase-MCTP structure. 
The Yr/5 (WTK1), Sr60 (WTK2), and Pm24 
(WTK3) have a tandem kinase structure. (3) 
Disease-resistant genes cording proteins with 
transmembrane transport functions, such as 
durable resistance genes Lr34/Yr18/Sr57/Pm38 
and Lr67/Yr46/Sr55/Pm46. There are rare natu- 
ral mutants in landraces carrying good-resistant 
genes, such as the famous Fhbl, Pm5e which 
encode an amino acid mutation in the NLR 
protein; the deletion of two amino acids in the 
powdery mildew resistance gene Pm24 (WTK3) 
confers broad-scope resistance to powdery mil- 
dew. Besides common wheat collections, the 
ancestral species of wheat usually contain abun- 
dant disease resistance genes, such as Pm60 
from T. urartu, the Yr15 (WTK1), Yr36, and 
Pm41 from the T dicoccoides, the Lr2] and the 
powdery mildew resistance gene WTK4 derived 
from A. tauschii. In addition, distantly related 
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wild species were also good resources to trans- 
fer resistance genes to wheat, such as the Pm21 
from  Haynaldia villosa, conferring durable 
and broad-spectrum resistance to wheat pow- 
dery mildew (Xing et al. 2018). The Fhb7 from 
Thinopyrum elongatum has good resistance for 
fusarium head blight spreading in wheat (Wang 
et al. 2020). 


6.10 Prospects 


The value of germplasm resources is in the 
genes hidden within them. The value of a gene 
is determined by its activity per se as well as 
the genetic background in which it is recov- 
ered. Only by transferring them from un-adapted 
germplasm into a good genetic background, 
assaying their value, and integrating them into 
breeding, we can truly activate them and realize 
their value for human life. 

Genome segment introgression is a major 
source of genetic variation in wheat. Genomic 
regions of introgression have provided the hot 
spots for structural variation that contains many 
dispensable genes such as tolerant genes to bio- 
stress and abio-stress. Wheat pangenomes will 
enable genome-wide high-resolution admix- 
ture mapping across species and help figure out 
causal genetic mutations underlying specific 
traits (Lei et al. 2021). Furthermore, the pange- 
nome-based research of hallmark cultivars will 
break through the limitation of having a single 
reference genome, for revealing the important 
contributions of chromosomal structural vari- 
ations (translocation, inversion, duplication/ 
deletion, PAV) in the formation of variety traits. 
Therefore, a pangenome within and across 
Triticum species will be of interest for wheat 
genomics in the next 5-10 years for interpreting 
and utilizing variation at the genome level for 
breeding and evolution (Khan et al. 2020). 

Crucial founder genotypes should be 
sequenced by the third-generation technology 
and carefully annotated. Using the founder gen- 
otype genome sequences as reference, a set of 
genetics relative cultivars can be sequenced by 
cheaper second-generation sequence technology 
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to reveal the haplotype blocks contributed by the 
founder genotypes in their genomes. The track- 
ing markers could then be developed for haplo- 
type-based breeding. Using the newest breeding 
founder genotypes as the recurrent parents 
crosses with core collections selectively back- 
crossing the hybrid for 2—3 generations can then 
establish AB-NAM populations. It is envisaged 
that through intercrossing between good lines 
from different AB-NAMs would be efficient 
strategy to realize the integration of breeding- 
beneficial alleles with desired haplotype blocks 
for create super-lines for breeding (Tanksley 
and Nelson 1996; Hao etal. 2020; Varshney 
et al. 2021). In addition, tetraploid wheats (see 
Chap. 8) would be an important and good gene 
resource for the improvement of common wheat. 
In view of the fact that the chromosome seg- 
ments from wild tetraploid wheat have suppres- 
sion effect on recombination, it is recommended 
that in the construction process of AB-NAM 
population, priority should be given to founder 
genotypes containing more introgressions from 
wild emmer to increase the recombination 
ratio, to create more variation, and efficiently to 
exclude the genetic drag from the wild species. 
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Abstract 


Ancient DNA (aDNA) promises to revolu- 
tionise our understanding of crop evolution. 
Wheat has been a major crop for millennia 
and has a particularly interesting history of 
domestication, dispersal, and hybridisation, 
summarised briefly here. We review how the 
fledgling field of wheat archaeogenomics 
has already contributed to our understand- 
ing of this complex history, revealing the 
diversity of wheat in ancient sites, both in 
terms of species and genetic composition. 
Congruently, ancient genomics has identified 
introgression events from wild relatives dur- 
ing wheat domestication and dispersal. We 
discuss the analysis of degraded aDNA in the 
context of large, polyploid wheat genomes 
and how environmental effects on preserva- 
tion may limit aDNA availability in wheat. 
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Despite these challenges, wheat archaeog- 
enomics holds great potential for answering 
open questions regarding the evolution of this 
crop, namely its domestication, the different 
dispersal routes of the early domestic forms 
and the diversity of ancient agricultural prac- 
tices. Not only will this research enhance our 
understanding of human history, but it will 
also contribute valuable knowledge about 
ancient selective pressures and agriculture, 
thus aiding in addressing present and future 
agricultural challenges. 


Keywords 


Archaeogenomics - Domestication - Genetic 
diversity 


7.1 Shining a Light on the Past: 


The Promise of Ancient DNA 


Ancient DNA (aDNA) has fostered a revolu- 
tion in evolutionary genomics, as it allows direct 
observation of historical molecular diversity 
(Der Sarkissian et al. 2014). Previously, hypoth- 
eses were based solely on the observation of 
modern genetic diversity, which is the end effect 
of thousands of years of evolution, with the 
main caveat that the same pattern of genetic var- 
lation is often consistent with different historical 
scenarios (Lawson et al. 2018). The analysis of 
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aDNA allows the genomic characterization of 
populations at different points in time, adding a 
fundamentally new dimension to evolutionary 
studies (Gutaker and Burbano 2017; Orlando 
et al. 2021). 

The very first aDNA analysis was conducted 
on a mitochondrial sequence of a museum- 
preserved quagga (Higuchi etal. 1984). Since 
then, the field of archaeogenomics has rapidly 
flourished (Morozova et al. 2016), allowing for 
a better understanding of human, animal, and 
plant evolutionary history. Recent advances in 
this field include sedimentary, epigenetic, patho- 
gens, and microbiome aDNA analysis (Key et al. 
2020; Parducci et al. 2017; Pedersen et al. 2014; 
Spyrou et al. 2019; Warinner et al. 2014). 

aDNA has already had a remarkable impact 
on our understanding of human history, shed- 
ding light on important patterns of migration 
(Lacan etal. 2011), admixture (Yang etal. 
2020), adaptation (Marciniak and Perry 2017), 
population dispersal, expansion, and decline 
(Nielsen et al. 2017). Notably, aDNA gave fun- 
damental contribution to our knowledge about 
the genetic relationships between modern 
humans and their extinct relatives Neanderthals 
(Weyrich et al. 2015) and Denisovans (Krause 
etal. 2010; Reich etal. 2010), the latter of 
which have only been identified through aDNA 
analysis. Similar insights have been gained in 
other animals, such as dogs (Botigue et al. 2017; 
Leathlobhair etal. 2018), cattle (Daly etal. 
2018; Verdugo etal. 2019), pigs (Frantz et al. 
2019), and horses (Gaunitz et al. 2018). These 
studies have led to a reassessment of previous 
evidence and an overturning of the existing nar- 
rative (Librado et al. 2021). 

Now, aDNA promises a similar revolution 
in our understanding of how crops have been 
domesticated and spread around the globe, 
and the ways that these processes have shaped 
genetic diversity. By revealing how crops have 
adapted to new environments and what genetic 
diversity has been lost, aDNA can also set a 
basis for future breeding strategies (di Donato 
et al. 2018; Pont et al. 2019b). Crop archaeog- 
enomics is still in its infancy, but aDNA from 
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several important crops has been analysed, 
including maize (Ramos-Madrigal et al. 2016), 
barley (Mascher et al. 2016; Palmer et al. 2009), 
cotton (Palmer et al. 2012), bean (Trucchi et al. 
2021), sunflower (Wales et al. 2018), sorghum 
(Smith et al. 2019), watermelon (Renner et al. 
2019), and emmer wheat (Scott et al. 2019). 

In this chapter, we first give a very brief over- 
view of the history of wheat cultivation and the 
key genetic changes involved. The aDNA tech- 
nology promises unique insights in this area. 
We review the wheat aDNA studies carried out 
so far and their contribution to understanding 
phenomena that have shaped wheat genomes. 
To conclude, we discuss the key open questions 
in this field and discuss the limitations posed by 
wheat’s large polyploid genome and idiosyn- 
cratic preservation. Our goal is to give an over- 
view of the important answered and unanswered 
questions in the history of wheat cultivation and 
the promise of aDNA for resolving them. 


7.2 ABrief History of Wheat 


Cultivation 


Human societies have relied on wheat for thou- 
sands of years. Thus, the history of wheat 
domestication, geographic expansion, and cul- 
tivation has cross-disciplinary significance 
(Fig. 7.1). Understanding how wheat genetic 
diversity has been shaped also has contemporary 
relevance due to its continued nutritional and 
economic importance. Archaeogenomic studies 
aim to give new information about at least three 
key aspects of this process: domestication, dis- 
persal, and gene flow between different wheat 
species. To contextualize contributions from 
archaeogenomics, we briefly overview these 
basic tenets of wheat cultivation history. 


7.2.1 Domestication 

Wild tetraploid emmer wheat was one of the first 
species to be domesticated (Haas et al. 2018), 
during the so-called Neolithic Transition, in 
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Fig. 7.1 Wheat has been culturally important for mil- 
lennia, and DNA extracted from ancient specimens can 
reveal how humans have shaped crop genetic diversity. 
Left: Facsimile of a vignette on the tomb of Sennedjem 
and lineferti showing grain harvest in the abundant 
fields of the next life (painted by Charles K Wilkinson 


parallel with humans' shift from hunting and 
gathering to agriculture and animal husbandry 
(Diamond 2002). The quintessential trait for 
cereal domestication is the loss of rachis brittle- 
ness: in wild cereals, the spikelets disarticulate 
spontaneously from the rachis upon maturity, 
ensuring seed dispersal and germination. In 
domestic cereals, the rachis is non-brittle; spike- 
lets remain attached, allowing easier harvesting 
but requiring subsequent sowing in the follow- 
ing season in order to germinate. Because plants 
with a non-brittle rachis depend on human 
action for dispersal, this phenotype has been 
used to define domestication in cereals (Abbo 
etal. 2014; Snir et al. 2015). Loss-of-function 
mutations in the Tt#Btr]-A and TtBtr1-B genes on 
chromosomes 3A and 3B are the main determi- 
nants of such phenotype (Avni et al. 2017; Nave 
et al. 2019). Therefore, alleles at these two loci 
essentially distinguish wild from domesticated 
emmer wheat. Other traits that are favourable 
in the human-mediated environment and most 
likely deleterious in a wild environment (Kantar 
etal. 2017; Purugganan and Fuller 2009) give 
a more broad definition of the “domestication 
syndrome" (Larson et al. 2014), like the loss of 
seed dormancy and larger seed sizes (Haas et al. 
2018; Zohary 2013). 

Wild emmer wheat has a very restricted dis- 
tribution, growing only in the Fertile Crescent 
region of Southwest (SW) Asia (Vavilov et al. 
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in 1922 CE, original ca. 1295-1213 BCE, public 
domain image from the Metropolitan Museum of Art). 
Right: Archaeological specimens of desiccated emmer 
wheat chaff from Egypt. Photo from Dorian Q. Fuller, 
University College London, Institute of Archaeology 


1992). The exact location of the emergence of 
domestic emmer has been a long-standing con- 
troversy. In the 2000s, early genetic studies 
started addressing this issue, with the so-called 
cradle of agriculture theory (Lev-Yadun et al. 
2000). Further genetic studies had pointed to 
the Northern Fertile Crescent and specifically to 
the Karaca Dag Mountain region as the centre of 
domestication of emmer wheat (Luo et al. 2007; 
Ozkan et al. 2002, 2005), mostly based on the 
higher similarities between the genomes of the 
modern domestic landraces and the wild emmer 
from the Northern Levant, compared to that of 
the Southern Levant (Avni et al. 2017). 
However, this monophyletic origin has been 
challenged with increasing evidence that differ- 
ent wild populations have contributed to domes- 
tic wheats. Several authors argue that domestic 
emmer wheat arose from an admixed wild popu- 
lation and that mutations for domestication traits 
appeared in different chromosomes at differ- 
ent times and possibly in different places (Civáñ 
et al. 2013; Jorgensen et al. 2017; Oliveira et al. 
2020). This is in line with the observation that 
the domestic phenotype, which requires at least 
two independent recessive mutations, took mil- 
lennia to be established (Avni et al. 2017; Fuller 
et al. 2014). As testified by the archaeological 
record, wild emmer wheat was first exploited 
in the Southern Levant, where increasing, even 
though small, proportions of phenotypically 
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domestic emmer wheat are found at different 
archaeological sites as early as during Early Pre- 
Pottery Neolithic B (8700-8200 BCE) (Arranz- 
Otaegui etal. 2018). However, domesticated 
emmer is found in very high proportions in 
the Northern Levant starting from the Middle/ 
Late Pre-Pottery Neolithic B (8200-6300 BCE) 
(Arranz-Otaegui et al. 2016). This indicates that 
wild emmer was managed (a phenomenon often 
regarded as “pre-domestication cultivation”) 
(Fuller et al. 2010) long before the domestic 
forms emerged, and that probably wild popula- 
tions from across the Fertile Crescent contrib- 
uted to the domestic pool (Feldman and Kislev 
2007). The role of introgression from wild to 
domestic wheat has been demonstrated by sev- 
eral studies, e.g. (Cheng et al. 2019; Pont et al. 
2019b; Przewieslik-Allen etal. 2021), even 
though the context in which these introgression 
events took place remains unknown. 

Overall, archaeology and genetics point to a 
slow and geographically widespread domestica- 
tion process in which both the Northern Levant 
and the Southern Levant played an important 
role. 


7.2.2 Evolution 


Domestic emmer wheat (Triticum turgidum 
subsp. dicoccon) gave rise to today’s most eco- 
nomically important wheats: tetraploid durum 
wheat (T. turgidum subsp. durum) and hexa- 
ploid bread wheat (T: turgidum subsp. aestivum). 
These descendants differ from their ancestor in 
one character of great agricultural importance: 
the free-threshing phenotype. Emmer is a hulled, 
non-free-threshing wheat, and the extraction of 
seeds from husks requires substantial mechani- 
cal processing. On the other hand, durum and 
bread wheat are naked and free-threshing: as 
the spikelets disarticulate from the rachis they 
fall apart, releasing the seeds without further 
processing. While durum wheat is tetraploid 
(BBAA), bread wheat is hexaploid (BBAADD) 
and evolved from the hybridization of tetraploid 
wheat with the diploid wild goatgrass (Aegilops 
tauschii), donor of the D subgenome (Haas et al. 
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2018; Pont etal. 20192). The tetraploid that 
contributed the B and A subgenomes to bread 
wheat has been a matter of debate (Sharma et al. 
2019), but considering the need for multiple 
mutations to determine the free-threshing phe- 
notype, the most supported (and most parsimo- 
nious) models indicate that hybridization with 
A. tauschii occurred with a free-threshing tetra- 
ploid (Zhou et al. 2020). 

The emergence of modern wheat is therefore 
the result of three processes: (I) domestication 
of wild emmer wheat, associated with the loss 
of rachis brittleness; (II) crop evolution (often 
also referred to as crop improvement under 
cultivation), which includes the emergence of 
the free-threshing phenotype and adaptation to 
new ecological niches; (III) allopolyploidiza- 
tion between a free-threshing tetraploid with A. 
tauschii, giving rise to bread wheat. We summa- 
rize these changes in Fig. 7.2. 

Perhaps surprisingly, hulled wheats con- 
tinued to be used for thousands of years after 
the appearance of free-threshing durum wheat 
and bread wheat. The slow and regionally spe- 
cific shifts in wheat usage probably reflect cul- 
tural practices and preferences (Nesbitt and 
Samuel 1996). Also, increasing archaeologi- 
cal evidence shows that early farmers relied 
on a wide range of other domestic wheats for 
their subsistence, including einkorn, spelt, and 
Triticum timopheevii alongside emmer and free- 
threshing wheats (Ozbasaran et al. 2018). This 
is in accordance with the evidence for intra and 
interspecific introgression that has been detected 
in modern wheat (Cheng et al. 2019; Zhou et al. 
2020). 


7.3 Archaeogenomics of Wheat 
Wheat archaeogenomics is a powerful tool to 
investigate how wild wheat evolved into domes- 
tic forms and how these domestic wheat varie- 
ties adapted to different ecological niches and 
cultural preferences through history. 

However, the limitations and the character- 
istics of ancient genomes have to some extent 
impacted the approach taken in this research 
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Fig. 7.2 Schematic representation of the domestication 
and evolution of the most economically important wheats 
today, showing important phenotypes and the mutations 
that determine them. Basic information about the appear- 
ance of the different wheats in the archaeological record 
is given on the right. The small white hand represents the 
investment of human labour in processing the harvest. 


field. Before high-quality reference genomes 
were available, most studies avoided whole- 
genome analysis and used a target and ampli- 
fication strategy. This mitigates the challenges 
of a large genome but gives much less rich 
genomic information. Furthermore, the primers 
used for amplification mask the characteristic 
patterns of degradation that are useful for ruling 
out contamination by confirming the antiquity of 
the DNA. Unlike these amplification methods, 
whole-genome libraries can also be re-analysed 
to get more data without further destructive sam- 
pling of rare material. For these reasons, ampli- 
fication approaches are no longer recommended 
for ancient samples (Gutaker and Burbano 2017; 
Priifer and Meyer 2015). 


*For simplicity, we use the common name “durum 
wheat” for all free-threshing tetraploids, but other com- 
mon names are used for free-threshing tetraploids, and 
it is not known which was involved in this allopolyploid 
event. This scheme is an adaptation of the model pro- 
posed by Sharma et al. (2019) 


We first overview wheat aDNA studies that 
use amplification and then describe the first two 
whole-genome analyses. Even though wheat 
archaeogenomics is in a germinal stage, the 
results have shifted our understanding of wheat 
genetics in important ways. 


7.3.1 Target Gene Amplification 

The most common use of target gene amplifi- 
cation has been to interrogate key genes or to 
identify wheat remains at the species level. The 
x and y copies of the Glu/ loci were often the 
focus of early studies. These genes, present in 
all wheat subgenomes, are located in the long 
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arms of chromosome | and encode for the high 
molecular weight glutenin subunits (HMW- 
GSs), storage proteins present in the starchy 
endosperm cells of wheat. Allelic varieties in 
these genes impact the properties of dough for 
bread making. Because of its effect over bread 
quality, the evolution of the HMW genes can 
provide insights into the nature of human selec- 
tive pressures during wheat evolution (Allaby 
et al. 1999). In this manuscript, authors surveyed 
these loci in a collection of modern and ancient 
wheats, constructed a phylogenetic tree, and 
obtained time estimates by using a substitution 
rate to calibrate the observed variation. By com- 
paring the genetic variability for x and y copies 
in each genome, they were able to determine 
that the genetic variability in these loci for the 
cultivated species predates domestication, point- 
ing to either incomplete lineage sorting, multi- 
ple domestication events, or introgression after 
domestication. Another study used a similar 
approach with the same loci to inquire about the 
origins of spelt (Blatter et al. 2002). They sur- 
veyed a collection of modern and ancient bread 
wheat and spelt specimens and determined that 
the high genetic variability of spelt compared 
to that of bread wheat in the A and B genomes 
are compatible with the origin of spelt being a 
hybridization event between bread wheat and 
hulled tetraploid emmer. 

HMW genes have also been used to iden- 
tify wheat remains at the subspecies level and 
inform about its dispersal. Without associ- 
ated chaff, it is difficult to distinguish between 
free-threshing wheats (e.g. bread wheat or 
durum wheat). Bilgic et al. (2016) targeted the 
HMW promoter region in 8400-year-old speci- 
mens from a notorious Neolithic site in cen- 
tral Turkey, Catalhóyük, to determine whether 
the genetic variability characteristic of the D 
genome could be recovered, as a proof of that 
wheat being hexaploid. The finding of HMW 
subunits from the A, B, and D genomes is quite 
remarkable, since it evidences the presence of 
hexaploid wheat at a very early point in time 
and highlights the importance of this settlement 
in the expansion of hexaploid wheat cultivation. 
Another study used the Internal Transcribed 


A. lob et al. 


Spacer regions (ITS1 and ITS2) and the Inter- 
Genic Spacer region (IGS) from the nuclear 
ribosomal DNA for species level identification 
(Li et al. 2011). They also found early evidence 
for hexaploid wheat in Northwest China around 
1760-1540 BCE. 

These results highlight the high diversity of 
wheats consumed by humans during early agri- 
cultural expansion. Free-threshing naked wheats 
first appear in the archaeological record between 
7000 and 5500 BCE (Feldman and Kislev 
2007). Early naked wheats co-existed with 
domestic and wild emmer populations (Bilgic 
etal. 2016), giving opportunities for genetic 
exchange. Along with the protracted period of 
emmer domestication, this probably explains the 
higher genetic diversity on A and B subgenomes 
of modern bread wheat compared to the D sub- 
genome (Cheng et al. 2019). This demonstrates 
how the details of agricultural history directly 
impact modern wheat diversity and breeding. 
Moreover, other wild Triticum species gave rise 
to domestic forms during the Neolithic. These 
include the diploid einkorn wheat, Triticum 
monococcum subsp. monococcum, that emerged 
from wild einkorn, T. monococcum subsp. 
Aegilopoides (Nesbitt and Samuel 1996), spelt 
(Triticum spelta), an hulled hexaploid, and 
tetraploid T: timopheevii (domesticated from T. 
timopheevii araraticum) (Wagenaar 1966), only 
recently classified thanks to aDNA analysis. 

The position of T. timopheevii within the 
domestication process of wheat in SW Asia 
exemplifies the value of aDNA to gain insights 
on certain domestication processes. Briefly, due 
to the technical difficulties in the identification 
of T. timopheevii, for a long time its existence 
was questioned, and it was often unclassified, 
or ascribed to other wheat species, such as 
"New Glume Wheat". Recently, archaeological 
remains described as “New Glume Wheat" have 
been designated as domestic T. fimopheevii 
based on aDNA evidence (Czajkowska et al. 
2020). The authors used the Ppdl locus to iden- 
tify G genome alleles in “New Glume Wheat" 
remains. This study has sparked the interest 
of the archaeobotanical community. Decades 
have passed since the first classification of 
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an archaeological specimen to “New Glume 
Wheat”. It was not until numerous remains 
of this type of wheat were found in several 
Neolithic and Bronze Age archaeological sites in 
northern Greece and compared with other loca- 
tions (Jones et al. 2000) that archaeologists were 
able to describe the distinctive features of this 
wheat (Ulas and Fiorentino 2021). Nevertheless, 
identification based on grain morphology is still 
problematic. The identification of New Glume 
Wheat as domestic T. timopheevii thanks to 
ancient DNA analysis has had important ramifi- 
cations on our understanding of the complexity 
of the domestication process in SW Asia and the 
confirmation that multiple species evolved into 
domestic forms, moving away from the “founder 
crops" theory. T. timopheevii was actually cul- 
tivated for a very long period of time in certain 
regions. New efforts are now being undertaken 
to revisit archaeobotanical assemblages and 
reassess the relative abundance of plant species, 
with the expectation that many grains classi- 
fied as emmer wheat will now be classified as T. 
timopheevii. 

The HMW loci were also used, together with 
the ribulose 1,5 biphosphate carboxylase (rbcL) 
and the chloroplast microsatellite WCTI2 in 
the chloroplast genome to study the viability 
of DNA extraction on ancient plant specimens 
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(Fernández et al. 2013). In this study, 126 grains 
of naked wheat in different preservation con- 
ditions (charred, partially charred, and water- 
logged) were analysed (Fig. 7.3 shows different 
preservation conditions of ancient wheat sam- 
ples). Results showed that DNA extraction from 
totally charred remains is virtually impossible, 
while DNA amplification of modern contami- 
nants is pervasive. Unfortunately, almost all of 
the most ancient archaeological wheat speci- 
mens are charred, which is a severe limitation 
for future aDNA studies. 

As mentioned above, one important limi- 
tation of amplification-based studies is the 
confidence with which one can rule out con- 
tamination. Commonly used indicators such as 
the fragment length distribution or deamination 
patterns are difficult to assess in target-specific 
PCR amplification studies. In addition, Allaby 
et al. (1999) reported PCR jumping, probably 
related with the shortness of some fragments. 
Their results showed patterns of linked diversity 
that did not exist in the modern pool and had to 
manually rearrange the observed diversity so it 
would match known modern haplotypes with the 
subsequent potential biases. 

Different strategies have been used to 
increase confidence in the antiquity of the data. 
Allaby et al. replicated the results in situ with 


Fig.7.3 Examples of different preservation condi- 
tions of archaeobotanical wheat. Left: charred emmer 
wheat seeds from the Vinča culture in Serbia (middle/ 
late Neolithic; c. 5400-4600/4500 BC), published in 


Filipovic (2014). Right: Waterlogged chaff remains of 
Triticum cf. durum/turgidum from the end of the 5th mil- 
lennium BC at the site of Les Bagnoles. Photo by Raül 
Soteras, AgriChange Project, reproduced with permission 
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the same specimen and produced blanks with 
each extraction run. Czajkowska et al. (2020) 
performed the extractions in laboratory facili- 
ties where no wheat had been processed before, 
hoping to preclude contamination. Bilgic et al. 
(2016) processed all samples in two different 
facilities, so that replication of the results acts 
as a proof of authenticity. In spite of this, even 
if contamination can be ruled out, it is not pos- 
sible to distinguish deamination patterns from 
true polymorphisms. Therefore, phylogenetic 
analyses and interpretation of the accumulation 
of variation through time should be taken with 
caution unless transitions (C/T or G/A SNPs) 
are excluded. 
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7.3.2 Whole-Genome Analyses 


As with modern wheat samples, the genomic 
scale of archaeological wheat genetics has 
been expanded since the publication of refer- 
ence genomes (Table 7.1). Nevertheless, only 
two studies have so far reported whole-genome 
sequence from archaeological wheat speci- 
mens. One has been the analysis of several bread 
wheat remains from China to infer dispersal into 
the region (Wu et al. 2019). The earliest bread 
wheat remains found in China date to approxi- 
mately 4500 years ago in the north-western part 
of the country, but the most interesting aspect of 
its dispersal is that upon its arrival, wheat had to 


Table7.1 Genomic information available for wheats and relatives mentioned in the text 


Species name Genome(s) Genome size Common name Key phenotypes Reference 
genome(s) 
Aegilops tauschii D 4 Gb Tausch’s goatgrass Luo et al. 
(2017) 
Triticum urartu A 4.5 Gb Wild red einkorn Brittle rachis, Ling et al. 
hulled (2018) 
Triticum monococcum A™ 5.7 Gb Wild einkorn Brittle rachis, NA 
hulled 
Einkorn Non-brittle NA 
rachis, hulled 
Triticum turgidum BA 12 Gb Wild emmer Brittle rachis, Avni et al. 
hulled (2017), Zhu 
et al. (2019) 
Emmer Non-brittle NA 
rachis, hulled 
Durum Non-brittle Maccaferri 
rachis, et al. (2019) 
free-threshing 
Triticum timopheevii GA 5.7 Gb Wild Timopheev’s wheat Brittle rachis, NA 
hulled 
Timopheev’s wheat Non-brittle NA 
rachis, hulled 
Triticum aestivum BAD 17 Gb Spelt Non-brittle Walkowiak 
rachis, hulled et al. (2020) 
Bread/Common Non-brittle Appels et al. 
rachis, (2018), 
free-threshing Alonge 
et al. (2020), 
Walkowiak 


This is not a comprehensive list of wheat species/subspecies 


et al. (2020) 
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be adapted to a wide variety of climatic condi- 
tions. Ancient wheat from two archaeological 
sites within the Xinjiang winter-spring wheat 
zone was analysed. Even though coverage was 
extremely low (0.25-0.01x), the authors were 
able to call more than 7000 SNP sites, com- 
pare them with modern data from neighbouring 
regions, and provide new evidence on wheat dis- 
persal in China, a still controversial topic. Their 
results were consistent with one of the routes 
that had been previously suggested: an early dis- 
persal into the Qinjianh Tibetan plateau, based 
on the highest genetic similarities between the 
ancient samples and the modern ones from that 
region. Conversely, another ancient route that 
advocated for an introduction towards the east- 
ern region was not supported. However, more 
data is needed to determine whether different 
gene pools were introduced to China and to 
confirm that modern landraces correspond with 
ancient ones from the same area. 

Another whole-genome analysis of archae- 
obotanical specimens looked at two desic- 
cated samples of 3000-year-old emmer wheat 
chaff (Fig.7.4) from Egypt (Scott etal. 
2019) to investigate early wheat dispersal and 


introgression from wild populations. The ancient 
samples were used to genotype exonic SNPs 
that segregate in modern accessions, at which 
coverage was 0.48 X after quality control, yield- 
ing approximately 100,000 high confidence 
genotypes. The authors used a haplotype-based 
approach to overcome as much as possible the 
limitations of aDNA analysis of polyploid spe- 
cies. Nearby sites that are not broken apart by 
recombination form co-inherited blocks called 
haplotypes. A “haplotype reference panel" 
combines information from multiple modern 
genomes to characterise the haplotypic varia- 
tion at each genomic location (McCarthy et al. 
2016). In the analysis of ancient data, when a 
sufficient number of genotypes can be iden- 
tified within a region, it is possible to assign a 
known haplotype (or no known haplotype, as 
may be the case when ancient diversity has 
been lost in existing populations) to the ancient 
sample. At this point, non-sequenced geno- 
types within the region can be deduced based 
on haplotype assignment, a method called 
imputation. Haplotypes are relatively long in 
wheat (Walkowiak et al. 2020) because self- 
ing tends not to break apart haplotypes as much 


r 


Fig. 7.4 Desiccated emmer wheat chaff from Hememiah North Spur (Egypt) 14C dated 1300-1000 BC, analysed by 
Scott et al. 2019. Photo by Chris J. Stevens, reproduced with permission 
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as outcrossing. As a consequence, low cover- 
age data is more likely to yield enough sites to 
assign an individual to a haplotype. This method 
allowed Scott et al. (2019) to identify genomic 
tracts tens of megabases long containing hun- 
dreds of genotypes that matched a modern 
sample in the haplotype reference panel. These 
included regions where important domestication 
QTLs had been identified, such that the domesti- 
cation allele can be imputed and the phenotype 
inferred. In contrast, other genomic regions did 
not match anything in the haplotype reference 
panel. 

The data essentially confirmed that genetic 
changes associated with domestication were 
completed by 3000 years ago, prior to emmer 
wheat dispersal to Egypt. Nevertheless, the 
ancient Egyptian sample carried more “unique” 
haplotypes than any other domesticated sample 
in the dataset, indicating regions where genetic 
diversity has been lost. It is not yet possible to 
state whether this lost variation is associated 
with adaptation to local environmental condi- 
tions or confers other useful traits. Nevertheless, 
these results highlight geographic and genomic 
regions that may harbour genetic diversity that 
has been used in the past and therefore might 
be useful in the present and future. Moreover, 
while the highly repetitive nature of the wheat 
genome increases the chances of misalignment 
issues and subsequent inflated heterozygosity, 
Scott et al. (2019) found that the estimated het- 
erozygosity of the ancient sample fell within the 
range of the modern samples. This suggests that 
reliable genotypes can be obtained from ancient 
wheat, providing appropriate quality filters are 
used to restrict attention to sites that do not suf- 
fer from alignment problems. 

Important results from this study concern 
early emmer wheat dispersal. Ancient routes 
of dispersal generally define modern popula- 
tion structure and overall genetic similarity 
but, with the changing usage of different wheat 
species and the adoption of modern elite varie- 
ties, we have little grasp of historical popula- 
tion dispersal and replacement. Contemporary 
emmer wheat subpopulations (landraces) 
reflect the dispersal outside of SW Asia to the 
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West (Mediterranean), to the Balkans (Eastern 
Europe), to Transcaucasia (Caucasus) and 
towards India and the Arabian peninsula (Indian 
Ocean) (Avni et al. 2017). The authors found 
that the ancient sample from Egypt resembles 
modern cultivars from the Indian Ocean sub- 
group, indicating a connection between early 
emmer dispersal to the East (across the Iranian 
Plateau and into the Indus valley) and to the 
South-West (Nile Valley). This is particularly 
interesting in light of the fact that Ethiopia cur- 
rently represents a region of genetic isolation 
and differentiation for tetraploid wheat. This 
ancient Egyptian sample also has signatures of 
gene flow with wild populations in the Southern 
Levant, which could have occurred during dis- 
persal towards Egypt or during Egyptian con- 
quests in the Ramesside era. We expect further 
aDNA studies to connect historical events with 
changes to wheat genetics. Answering these 
questions will not only bring a deeper under- 
standing of wheat evolution, but also human his- 
tory, which has been intimately linked to wheat 
cultivation for millennia. 

Overall, the field of wheat archaeogenomics 
has yet to reach its full potential. However, the 
field is primed for new advances with the avail- 
ability of reference genomes and a wealth of 
resequenced modern landraces for comparison. 
While the prospects for studying DNA from 
charred remains are poor, many desiccated or 
waterlogged samples have great potential for 
further study. Archaeological research on water- 
logged sites is increasing, which promises new 
material to complement the specimens currently 
in museums and collections. 


7.4 Analysing Degraded DNA 


from Ancient Polyploid Wheat 


Degradation and contamination are key compli- 
cations for the reliable analysis of ancient DNA. 
To mitigate these problems, specific methods 
have been developed for sample preparation 
and downstream analysis (reviewed in Orlando 
etal. 2021). Even with appropriate methodol- 
ogy, DNA from ancient and historical samples 
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cannot be used for all the applications that 
modern sequence data allows. We briefly over- 
view these general principles of ancient DNA 
analysis, before discussing the specific issues 
posed by wheat, as all these factors should be 
considered during study design and analysis. We 
expect future methodological improvements to 
address these challenges, raising the possibility 
of resolving further important questions in the 
history of wheat domestication and evolution. 


7.4.1 aDNA Damage 

A prominent difference between ancient and 
modern DNA is that ancient DNA is much more 
fragmented prior to extraction (Fig. 7.5a). Most 
DNA fragmentation occurs rapidly after death 
(Kistler et al. 2017), as the DNA “backbone” 
breaks down through a process called “hydro- 
lytic depurination", which is biochemically 
predicted to occur more rapidly with exposure 
to water and high temperatures (Lindahl 1993). 
Thus, local preservation and environmental 
conditions are key in determining DNA yield 
and quality in different samples. Nevertheless, 
fruitful DNA sequencing has been conducted 
from plant tissue that is thousands of years 
old and from tropical and warm environments 
(Fornaciari et al. 2018; Mascher etal. 2016; 
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Fig.7.5 Characteristic patterns of DNA degradation 
in sequence from a 3000-year-old emmer wheat sample 
(Scott et al. 2019). a Shows the raw distribution of frag- 
ments sizes and b shows misincorporations relative to 
the reference genome after alignment. In this case, the 
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Ramos-Madrigal etal. 2016; Renner etal. 
2019). Overall, excellent DNA preservation has 
been reported from plant remains in desiccated 
and waterlogged conditions (Kistler et al. 2020). 

Besides fragmentation, the DNA sequence 
itself undergoes modifications. Notably, a pro- 
portion of cytosine residues lose an amine 
group, becoming uracil residues, which code 
as thymine during sequencing (Briggs etal. 
2007). This hydrolytic deamination occurs more 
commonly on the single stranded overhangs of 
the fragmented DNA molecules. As a result, 
when aligned to a reference genome, sequenced 
ancient DNA has a higher proportion of C-to-T 
misincorporations at the 5' end of each frag- 
ment. Double-stranded DNA libraries will also 
show a higher proportion of the complementary 
misincorporation, G-to-A, at the 3' end of each 
fragment after alignment. 

These characteristic patterns of degradation 
found in ancient samples can be useful to the 
analysis, as they are proof of the sample antiq- 
uity. Therefore, the most common approach 
is to carry out a protocol developed for partial 
UDG treatment (Rohland etal. 2015). With 
this method, uracil-DNA-glycosylase (UDG) 
is used to remove uracils (Briggs et al. 2010) 
in the inner region of the fragments, but not at 
their ends. In this way, some amount of damage 
is maintained, but it is confined to the fragment 
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sequenced library was partially UDG treated such that 
the misincorporations caused by post-mortem damage 
are confined to a few base pairs at the fragment ends, 
which are removed for further analysis 
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ends (Fig. 7.5b). Similarly, the distribution of 
fragment lengths is used to confirm that the 
sequenced DNA is ancient, where large frag- 
ments may indicate contamination. Finally, 
paired-end sequencing of short fragments 
will often result in the same base pair being 
sequenced twice, which can be used to improve 
confidence in the sequence (Jonsson et al. 2014). 

Standard bioinformatic protocols have been 
established for processing fragmented and dam- 
aged DNA. In general, standard approaches have 
been established for mapping short-read data to 
reference genomes and automated tools/pipe- 
lines are available for ancient genotypes calling 
for downstream analyses (Peltzer et al. 2016; 
Schubert et al. 2014). Common methods involve 
trimming off all the base pairs at the end of frag- 
ments that are potentially affected by damage 
(Jonsson et al. 2014) and verifying that analyses 
are unaffected when transitions (SNPs where 
the two alleles are either C/T or G/A and that 
can include post-mortem damage) are excluded 
(Korneliussen et al. 2014). We further note that 
“reference bias” (preferential alignment of reads 
carrying the same allele as the reference) is 
stronger in ancient data due to the shorter frag- 
ment size, so correction methods should be used 
(Günther and Nettelblad 2019). 

For all these reasons, whole-genome 
sequencing has become the standard in ancient 
DNA studies, while PCR-based approaches are 
no longer considered unless for very specific 
goals such as genome identification, since they 
do not allow to verify the presence of these 
important patterns of post-mortem damage and 
to exclude contamination. 

Contamination is a significant concern in 
ancient DNA studies. Because the amount of 
DNA preserved in ancient samples tends to be 
low, relatively small amounts of contamination 
from contemporary material can overwhelm the 
target DNA in the library (Renaud et al. 2019). 
Extraction and manipulation of ancient DNA 
therefore requires specialized facilities with 
protocols that minimize contamination by mod- 
ern DNA (Fulton 2012). Standard practice is 
to create a control sequencing library without 
using the sample tissue (an “extraction blank”). 
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The data from controls is analysed alongside 
the main sample to quantify the contamination 
and spurious signals likely to have been intro- 
duced during DNA extraction. Contamination 
can also come from microbial decomposers that 
invade tissues after death. A simple estimate for 
overall contamination is the percentage of reads 
that can be aligned to the reference genome of 
the targeted species, although other methods are 
available (Peyrégne and Priifer 2020). So far, 
the percentage of endogenous DNA (the DNA 
of interest) reported in whole-genome studies of 
ancient plants has been high, compared to ani- 
mal studies. For example, reported endogenous 
fractions have been 33-66% in emmer wheat 
(Scott et al. 2019), 5-90% in bread wheat (Wu 
etal. 2019), 7-5496 (mean 4496) in common 
bean (Trucchi et al. 2021), and 7096 in maize 
(Ramos-Madrigal et al. 2016). 

Degradation and contamination limit the 
applications of ancient DNA, relative to mod- 
ern DNA. Firstly, the fraction of endogenous 
DNA in well-preserved ancient DNA librar- 
ies is far below that of modern DNA (which 
usually is>99%). Because endogenous frag- 
ments are short, the sequencer will often read 
through the DNA fragment and continue onto 
the adapter sequences used for library prepara- 
tion. Sequenced adapter fragments must thus be 
discarded. Furthermore, if the sequencing has 
been performed for paired-ends, the forward 
and reverse reads will overlap (and are then 
collapsed into a consensus sequence). Given 
the low endogenous content and the short frag- 
ments, more sequence data is needed to reach 
reasonable coverage. Nevertheless, when small 
amounts of DNA are present in the sample, 
it may not be possible to keep sequencing to 
increase the coverage, since the library gradu- 
ally yields diminishing returns as more duplicate 
reads are sequenced (Link et al. 2017). For all 
these reasons, coverage tends to be significantly 
lower in aDNA studies, when compared to the 
expectations for modern data. 

Overall, due to low coverage and short frag- 
ments in ancient DNA, a typical approach is to 
identify variable sites (e.g. SNPs) using modern 
samples only, then use ancient DNA alignments 
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to genotype the ancient samples. Fortunately, 
this approach often yields sufficient high-quality 
genotypes to perform analyses of interest, such 
as estimating genome-wide relatedness, intro- 
gression, and population genetic parameters. 


7.4.2 Large Polyploid Wheat Genomes 


The large genome of wheat (17 gigabases 
for bread wheat) implies that whole-genome 
sequencing of each wheat sample requires 
more resources compared to other organisms 
with smaller genomes. This cost is exacerbated 
in ancient DNA studies by the lower fraction 
of endogenous DNA, which requires further 
sequencing effort to obtain the same genomic 
coverage. In wheat, pre-designed probes are 
available for exons and promoters (Gardiner et al. 
2019; Jordan et al. 2015), which reduce sequenc- 
ing costs by enriching for sequences that are 
captured by the probes used. In ancient DNA, 
capture can enrich endogenous DNA (Hofreiter 
et al. 2015) but increase clonality and introduce 
biases towards the sequence on the probes (Ávila- 
Arcos et al. 2011). Exome-wide capture has not 
been reported for an ancient wheat. However, tar- 
geted capture might be useful to avoid repetitive 
regions since short aDNA fragments give little 
information about this class of DNA. 
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Ploidy and the high identity between sub- 
genomes, estimated to be as high as 97-98%, 
supposes another challenge for ancient DNA 
studies. Even with modern samples, wheat 
resequencing studies can only reliably observe 
genomic regions that can be unambiguously 
aligned using the read lengths available. The 
shorter fragment length of ancient DNA places a 
practical limit on the portion of the genome that 
can be directly observed by mapping to refer- 
ence genomes. 

Heterozygosity is commonly used as an indi- 
cator of misalignment problems. Because wheat 
is predominantly selfing (Golenberg 1988), most 
sites should be homozygous in most individuals. 
However, various structural variants can cause 
reads from different genomic regions in the 
sample to be aligned to the same position in the 
reference genome (Fig. 7.6) with high mapping- 
quality scores, thus passing quality filters. As 
a consequence, sample heterozygosity will be 
inflated after calling genotypes. A common solu- 
tion is to remove variants that are heterozygous 
in multiple samples, e.g. (Gardiner et al. 2019; 
He et al. 2019). Recent data indicates that unde- 
tected gene duplicates are common within wheat 
subgenomes on reference assemblies (Alonge 
et al. 2020). In general, polyploid wheat rese- 
quencing data will suffer from additional mis- 
alignments due to homeologous sequences on 
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Fig. 7.6 False heterozygosity introduced by mis-map- 
pings to the reference. Here, we consider two genomic 
regions (blue and yellow), which are homeologues or 
duplicated regions that are relatively similar to one 
another. A site in each region is genotyped (coloured pur- 
ple and green). In a, the sample is similar to the refer- 
ence so that reads can be aligned to the correct region, 
and the genotype calls are all homozygous, as expected 
for most sites in a largely selfing species. In b, there is a 


difference between the reference genome and sequenced 
genome (indicated in grey). The sample reads from the 
blue genomic region in b are best aligned to the yellow 
region of the reference. This results in a heterozygous 
genotype call, while all the true genotypes are homozy- 
gous. Thus, inaccurate reference genome assemblies, 
deletions, insertions, or duplications can all result in spu- 
rious heterozygous genotypes 
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different subgenomes, but reliable genotypes 
can be obtained from both modern and ancient 
wheat provided appropriate quality filters are 
used to restrict attention to sites that do not suf- 
fer from alignment problems. Nevertheless, we 
emphasize that care should be taken when meas- 
uring heterozygosity in polyploid wheats, espe- 
cially from ancient genomes. The limitations 
in estimating heterozygosity are unfortunate 
because it is heterozygosity that is a common 
indicator of outcrossing and genetic variation 
in the population, changes to which are key 
questions in the history of cultivation practices 
(Smith et al. 2019; Trucchi et al. 2021). 


7.5 The Future of the Past: Open 
Questions and Prospects 


for Wheat aDNA 


Crop archaeogenomics has already proved to be 
a powerful tool to investigate phenomena such 
as domestication, crop dispersal, and subsequent 
adaptation (Kistler et al. 2020; Orlando et al. 
2021). Studies on bean (Trucchi et al. 2021), sun- 
flower (Wales et al. 2019), and sorghum (Smith 
et al. 2019) showed that the “domestication bot- 
tleneck” (i.e. the initial loss of genetic diversity 
associated with domestication) may not be as 
intense as previously assumed. Ancient DNA 
analysis has been used to trace the origin of some 
important winemaking grape cultivars (Ramos- 
Madrigal et al. 2019) and brought insights on 
the genetic basis of potato adaptation to the 
European climate (Gutaker et al. 2019). In maize, 
adaptation to climatic constraints (selected from 
ancient standing variation within the domestic 
forms) has been identified as the main driver of 
modern differentiation between populations (Da 
Fonseca et al. 2015; Swarts et al. 2017). 


7.5.1 Open Questions 


in Domestication 


In recent years, some paradigms of domestica- 
tion have been challenged by new scientific dis- 
coveries, and wheat represents a good example 
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of such changing perspectives. Because now 
we know that domestic forms took thousands 
of years to dominate archaeological assem- 
blages and that different wild populations seem 
to contribute to modern diversity, it is likely that 
wheat domestication was not as severe, abrupt, 
or geographically restricted as expected under 
the assumption of a “domestication bottleneck” 
(see Sect. 7.2). The presence of peculiar haplo- 
types in an ancient emmer wheat sample from 
Egypt showed that possibly genetic diversity 
has been lost after emmer wheat domestication 
and dispersal to Egypt (Scott et al. 2019), in line 
with what has been found for other species, e.g. 
(Trucchi et al. 2021). In the case of wheat, more 
ancient samples are needed to determine the 
association (or lack of thereof) between domes- 
tication and losses of genetic diversity. 

Second, it is unclear whether there is a mono- 
phyletic “centre of domestication” for emmer 
wheat in the Northern Levant. The contribution 
of the Southern Levant gene pool to domestic 
emmer has been detected in several studies, but 
its origin remains unsolved. Whether emmer was 
domesticated from a proto-domestic admixed 
population, or if early domestic populations ben- 
efited from extensive gene flow from the wild 
is still to be revealed. It has been proposed that 
the high genetic similarity of modern domestic 
to Turkish wild emmer could be explained by a 
feralization of the very first proto-domestic pop- 
ulation (Civan et al. 2013; Oliveira et al. 2020). 
The analysis of wild and domestic samples from 
this region dating back to Pre-Pottery Neolithic 
and Neolithic could help determine the origin 
of the domestic pool, and its relationships with 
ancient and extant wild populations. 

The recent genetic identification of domes- 
ticated 7. timopheevii has triggered a re-eval- 
uation of its importance and abundance in the 
archaeological record. This effort will be greatly 
aided by a genetic survey of the modern wild 
specimens, together with ancient seeds. In gen- 
eral, it will be interesting to use ancient and 
modern genetic data to compare the origins in 
space and time of parallel domestication events 
in wheat (emmer wheat, einkorn wheat, and 7. 
timopheevii). 
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Prospects for the analysis of DNA from fully 
charred remains are poor, which limits the direct 
genetic analysis to unveil some of the earliest 
and most crucial events in wheat domestica- 
tion. Nevertheless, we expect that improvements 
in the modelling of genomic evolution and the 
increasing availability of waterlogged remains 
will allow to test alternative scenarios on top of 
addressing questions concerning adaptation and 
spread of wheat. 


7.5.2 Open Questions in Dispersal 
and Adaptation 


The dispersal of wheat was accompanied by 
adaptation to different environments, leading 
to the evolutionary success of this species. An 
interesting example is adaptation to altitude 
along certain dispersal routes. Wild emmer 
wheat from the Northern Levant, the closest to 
all domestic landraces, is always found at high 
altitude. Its dispersal towards Egypt entailed 
cultivation at sea level, but emmer wheat grown 
on the Ethiopian plateau is cultivated at high 
altitudes again. There are two possible routes of 
dispersal leading to Ethiopia, one through Africa 
and another through the Iranian plateau and the 
Arabian Peninsula. The first one would entail a 
second adaptation event to high altitudes. The 
other would have always been cultivated at high 
altitudes, but there would require a longer dis- 
persal route. How did emmer wheat arrive to 
Ethiopia? The analysis of desiccated specimens 
from the Arabian Peninsula, Sudan, and ideally 
Iran could help to answer this question, as well 
as potentially unveiling genetic mechanisms for 
adaptation to high altitude. 


7.5.3 Open Questions in Hybridization 
and Speciation 


Archaeological data increasingly suggests that 
different wheat species were used in a complex 
geographical mosaic that shifted through time. 
Given that several wheat species, i.e. emmer, 
einkorn, naked wheats, and T. timopheevi (and 
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wild relatives) co-existed in the same area 
for millennia, we can ask how much genetic 
exchange was ongoing in Neolithic settle- 
ments. While the vast majority of wheat culti- 
vated today is bread wheat, other free-threshing 
hexaploids such as the Indian dwarf wheat or 
the Yunan wheat could have arisen from differ- 
ent hybridization events, since the phylogeny of 
the A and B genomes differs from that of the D 
genome (Zhou et al. 2020). Furthermore, forms 
such as T. compactum (Club Wheat) have been 
described (e.g. Kaplan et al. 1992), even though 
it is unclear whether these morphotypes are the 
product of different hybridizations events or the 
consequence of differential selective pressures. 
A comparison of the D subgenome in ancient 
hexaploids with modern Aegilops specimens 
could tackle this question and narrow down the 
geographic origin where these hybridizations 
occurred. 

Even more intriguingly, we can speculate 
whether introgressed genetic variation between 
different wheats was important for crop evolu- 
tion and adaptation to different environments 
such as adaptation to northern latitudes or to 
heat stress. Einkorn wheat and spelt were impor- 
tant crops in central and northern Europe. On 
the other hand, hexaploid free-threshing wheats 
such as Indian dwarf wheat and T. compactum 
are more commonly found in warm environ- 
ments. Studying changes in allele frequencies 
with the spread of these crops into new envi- 
ronments would identify candidate adaptive 
regions, whose phenotypic effects and useful- 
ness could be analysed through crossing and 
genetic mapping. Learning from the phyloge- 
netic relationship between ancient wheat speci- 
mens would greatly increase the power to detect 
the genomic regions conferring adaptation to 
those traits. 

Furthermore, besides the impact that archae- 
genomics has on our understanding of the past, 
it has also the potential to set the basis for 
future food security (Pont et al. 2019b), con- 
servation and breeding strategies, in the cur- 
rent context of climate change (di Donato et al. 
2018). During the dispersal of domestic plants, 
crops adapted to a multitude of environments, 
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and aDNA can reveal genetic diversity pre- 
sent in historical landraces but lost from the 
modern domestic pool (e.g. Scott et al. 2019). 
Detecting signals of positive selection in such 
Jost diversity may therefore be particularly 
valuable, especially when it is the source of 
adaptations to extreme environments. After its 
identification, such diversity can be prioritized 
for preservation or introduced to modern culti- 
vars via breeding if still present in seed banks, 
landraces, or wild relatives (di Donato et al. 
2018). Plant aDNA studies can lead to the iden- 
tification of lost crops and their wild relatives, 
revealing their genetic makeup. Such knowl- 
edge could set the ground for de novo domesti- 
cations and ultimately aid in the diversification 
of our food system, which currently relies on 
a rather small number of domestic species 
(Estrada etal. 2018). Finally, aDNA can be 
informative of past plant-pathogens interac- 
tions and their co-evolution, e.g. (Yoshida et al. 
2013), providing valuable insights for crop 
management (di Donato etal. 2018; Estrada 
et al. 2018; Przelomska et al. 2020). 

In conclusion, archaeogenomics allows 
interrogation of a plethora of questions about 
wheat evolutionary history, such as popula- 
tion continuity and demographic changes 
through time, identification of climatic or cul- 
tural conditions that correspond to germplasm 
shifts, and relationships with other wheats. 
We expect these questions to be addressed in 
future aDNA studies. Overall, answering these 
questions will not only bring a deeper under- 
standing of wheat evolution, but will also aid 
answering questions about human cultural evo- 
lution and trade. 
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Abstract 


Durum and bread wheat are two related spe- 
cies with different ploidy levels but a high 
similarity between the common A and B 
genomes. This feature, which allows a con- 
tinuous gene flow between the two species, 
can be exploited in breeding programs to 
improve key traits in both crops. Therefore, 
durum wheat, despite covering only 5% of 
cultivated wheat worldwide, also represents 
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an asset for the genetic improvement of bread 
wheat. Tetraploid wheat, with a very large 
availability of wild and domesticated acces- 
sions, durum landraces, and cultivars, offers 
a large gene reservoir to increase the genetic 
diversity of A and B genomes in bread wheat. 
Moreover, thanks to the possibility of crossing 
durum wheat with Aegilops tauschii, synthetic 
hexaploid lines are generated which show 
a much larger genetic diversity also in the D 
genome compared to common wheat. The 
genome sequences of wild emmer, durum, 
and bread wheat provide power tools for gene 
cloning and comparative genomics that will 
also facilitate the shuttling of genes between 
tetraploid and hexaploid wheats. 


Keywords 


Tetraploid - Synthetic wheat - Gene flow - 
Selection signatures - Wild germplasm 


8.1 Introduction 


Durum wheat (tetraploid) and bread wheat 
(hexaploid) are two closely related species with 
potentially different adaptation capacities and 
only a few distinct technological properties that 
make durum semolina and wheat flour more 
suitable for pasta or bread and bakery products, 
respectively (Mastrangelo and Cattivelli 2021). 
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The history of wheat began with the domes- 
tication of wild emmer wheat (WEW, Triticum 
turgidum ssp. dicoccoides) in the mountains 
of the Fertile Crescent around 12—10 thousand 
years ago, which gave rise to the first domesti- 
cated form (domesticated emmer wheat, DEW, 
T. turgidum ssp. dicoccum) and a first domes- 
tication sweep related to Brittle rachis (Btr) 
trait. Then, human selection of natural muta- 
tions at a few loci associated with the domesti- 
cation syndrome (i.e., Tg tenacious glume and 
Q compact spike) allowed for the selection of 
wheat forms with square-shaped spikes, soft 
glumes, and non-hulled grains improving with 
an improved threshing efficiency, grain size and 
uniformity, productivity, and suitable for a more 
widespread cultivation. This phenotypic evolu- 
tion together with hybridization between different 
forms (Matsuoka 2011) led to free-threshing sub- 
species (T. turgidum ssp. turgidum, ssp. turani- 
cum, ssp. polonicum, ssp. carthlicum, and ssp. 
durum, Fig. 8.1), all inter-fertile and sharing the 
same AABB genomic configuration. Hulled and 
free-threshing forms played a crucial role in the 
development of Mediterranean civilizations. They 
were at the base of early agricultural movements 
leading to agriculture systems based on tetraploid 
wheat. Among the different subspecies, durum 
wheat (T. turgidum ssp. durum) became the major 


ssp. turgidum 


ssp. turanicum 


ssp. polonicum 
— 


E. Mazzucotelli et al. 


cultivated form of tetraploid wheat during the 
last 3000 years. Nowadays, élite durum wheat 
cultivars (DWCs) and durum wheatlandraces 
(DWLs) grow in different environments around 
the Mediterranean Basin and are of major impor- 
tance for grain production and for staple food, 
respectively (Fig. 8.2). 

The expansion of emmer cultivation toward 
the Transcaucasian corridor promoted an addi- 
tional natural hybridization of tetraploid forms 
with Aegilops tauschii (genome DD) and the 
emergence of the hexaploid bread wheat (T. 
aestivum L. ssp. aestivum, genome AABBDD) 
(Dubcoksky and Dvorak 2007). As a result, 
durum and bread wheat share the A and B 
genomes and a long evolutionary history. 


8.2 Tetraploid Genetic Resources 

Most of wheat genetic diversity is contributed 
by tetraploid wheat genetic resources, particu- 
larly primitive tetraploids and wild and domes- 
ticated emmer. Indeed, the bottleneck effect, 
caused by the evolutionary recent hybridization 
events from which hexaploid wheat has evolved, 
has strongly limited its genetic diversity com- 
pared to tetraploid and diploid wheats (Cox 
1997). Therefore, tetraploid wheat germplasm 
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Fig. 8.1 Examples of spikes of some subspecies belonging to the species Triticum turgidum 
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represents a strategic reservoir of alleles for both 
durum and bread wheat improvement (Marone 
et al. 2021). The International Maize and Wheat 
Improvement Center (CIMMYT) and the 
International Center for Agricultural Research in 
the Dry Areas (ICARDA) have the largest col- 
lections of tetraploid wheat, with approximately 
27,500 and 22,500 accessions, respectively, 
including 22,000 and 20,000 durum wheat 
accessions. The ICARDA gene bank stores more 
than 15,700 accessions of DWL and traditional 
cultivars, while CIMMYT retains a larger col- 
lection of domesticated emmer wheat (around 
3000 accessions). A wide tetraploid wheat diver- 
sity is conserved by the National Small Grains 
Germplasm Research Facility at USDA-ARS 
where approximately 12,500 accessions are con- 
served, with a large representation of primitive 
tetraploid subspecies (T. turgidum ssp. poloni- 
cum, ssp. carthlicum, ssp. turanicum, and ssp. 
turgidum) and more than 900 WEW accessions. 
Many national gene banks also retain important 
local germplasm resources which include his- 
torical materials and the predominant remaining 
landraces (Robbana et al. 2019). 

Many studies have reported on the assem- 
bly and characterization of panels of tetraploid 
genotypes, from tens of genotypes to many 
hundreds of entries of wider origin. These 
germplasm collections are representative of 
(i) subspecies, (ii) specific geographic regions 
including local DWLs, historical cultivars, and 
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Fig. 8.2 Morphological and color variability for spikes of T. turgidum ssp durum cultivars 
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modern DWCs, and (ii) breeding programs. 
They have been characterized for population 
structure, genome-wide molecular diversity, and 
linkage disequilibrium (LD)-decay rate as esti- 
mated with either multi-allelic (SSRs) and/or 
bi-allelic (DArT™, AFLPs) markers in earlier 
studies (Maccaferri et al. 2005; Mantovani et al. 
2008; Laidó et al. 2013; Roncallo etal. 2019) 
or more recently with the Illumina iSelect 90K 
SNP array (Maccaferri et al. 2016; Saccomanno 
et al. 2018; N’Diaye et al. 2018) and the Axiom 
35K array (Kabbaj et al. 2017). All these stud- 
ies have generated an in-depth description of 
genetic diversity and differentiation within/ 
among subspecies and subgroups with a focus 
on both temporal and spatial trends, particularly 
targeting the cultivated and DWL germplasm. 
Following the publication of the first high- 
density SNP-based consensus map of tetra- 
ploid wheat (Maccaferri et al. 2015) and the 
first release of the durum wheat reference 
genome of cultivar SVEVO, many SSRs and 
iSelect 90K wheat SNPs have been anchored 
on the durum genome sequence, thus providing 
opportunities for genetic insights on relevant 
genomic regions (Maccaferri et al. 2019). Two 
recent collaborative studies provided germplasm 
panels and advanced in-depth analysis sup- 
porting a detailed knowledge at the molecular 
level of the historical loss of diversity events. 
The identification of favorable allelic combina- 
tions progressively accumulated over repeated 
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breeding cycles is instrumental for a more effec- 
tive management of breeding. In the first study, 
the International Durum Wheat Sequencing 
Consortium supported a comprehensive analysis 
of genetic diversity in tetraploids which entailed 
the organization of the single seed descent 
Tetraploid Germplasm Collection (TGC) and 
its genetic diversity analysis using the Illumina 
iSelect 90K SNP array, projected onto the Svevo 
genome (Maccaferri et al. 2019; Fig. 8.3). 

At the same time, the Wheat Initiative through 
the durum wheat-expert working group supported 
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Germplasm Collection (TGC) composed of 1856 acces- 
sions of tetraploid wheat. a Neighbor joining tree from 
Nei’s genetic distances on the TGC. b Principal com- 
ponent analysis plot of the TGC calculated based on 
genome-wide linkage  disequilibrium-pruned SNPs. 
c Admixture analyses of the TGC with k (number of 
populations assumed for the analysis) from 2 to 20. 
Correspondence between branches and main tetraploid 
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the development of the Global Durum Panel 
(GDP), a collection targeting mainly the cultivated 
durum germplasm, and fully genotyped with the 
Illumina iSelect 90K wheat SNP array. The GDP 
genetic diversity is described in Mazzucotelli et al. 
(2020). The two collections have been assem- 
bled, seed increased and made freely available 
for research with the aim to facilitate the inven- 
tory, molecular and phenotypic characterization, 
and use of tetraploid genetic resources for durum 
and bread wheat improvement. The collections 
are maintained at ICARDA (Morocco), University 


E Wild emmer wheat, North Eastern Fertile Crescent (WEW-NE) population 
E Wild emmer wheat, Southern Levant Fertile Crescent (WEW-SL) population 
ESSI Domesticated emmer wheat (DEW); several populations 

INN Domesticated emmer wheat; Ethiopian population (DEW-ETH) 

© Durum wheat landraces (DWL); several populations 

mü, Durum wheat landraces; Ethiopian populations (DWL-ETH) 

mm T. turgidum ssp. carthlicum 

IN T. turgidum ssp. polonicum 

mm T. turgidum ssp. turanicum 

mm T. turgidum ssp. turgidum 

mm Modern durum wheat cultivars (DWC) 


wheat taxa/populations based on Nei’s genetic distances, 
PCA and Admixture are indicated by color code (modi- 
fied from Maccaferri et al. 2019). The analyses of popu- 
lation structure concord to highlight five major subpopu- 
lations: wild emmer wheat, domesticated emmer wheat, 
durum wheat landraces from Ethiopian, durum wheat 
landraces from Asian and Mediterranean regions, and 
durum wheat cultivars 
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of Bologna (Italy), and CREA-Research Centre 
for Genomics and Bioinformatics (Italy). Related 
information and genotypic data are accessible at 
the GrainGenes database (https://wheat.pw.usda. 
gov/GG3/global_durum_genomic_resources). 


Wild Emmer Shows the Widest 
Range of Adaptation 

to Environment and Retains 
the Highest Level of Genetic 
Diversity Genome-Wide 


8.2.1 


WEW is an annual, predominantly self-polli- 
nating allotetraploid species with large, elon- 
gated grains and brittle ears disarticulating at 
maturity into spikelets. Molecular data indi- 
cate that WEW is about 500,000 years old, 
resulting from a hybridization event between 
two wild diploid grasses that took place in the 
Fertile Crescent, probably in the vicinity of Mt. 
Hermon and the catchment area of the Jordan 
River where a center of WEW diversity has been 
reported (Dvorak and Akhunov 2005; Feldman 
and Kislev 2007). WEW is naturally distrib- 
uted in the Near East Fertile Crescent with two 
major races which are geographically, morpho- 
logically, and genetically distinct: (1) the north- 
eastern part of the Fertile Crescent, with main 
populations found in north- and central-eastern 
Turkey, western Iran, and northern Iraq; (2) 
the western race found in the southern Levant, 
including Syria, Lebanon, Jordan, and Israel. 
Apart from dense and frequent natural popu- 
lations found in the upper Jordan valley catch- 
ment area in Israel, and massive stands on the 
basalt slopes of the Karacadag (Şanlıurfa and 
Diyarbakir provinces) in Turkey, WEW cur- 
rently displays a patchy distribution in the 
region, with populations being semi-isolated 
or isolated. Its habitats range in altitude from 
100 m below sea level up to 1800 m above sea 
level, with very different climatic regions from 
cool and humid Karacadag Mountains to hot and 
dry valleys in Israel (Nevo et al. 2002). 

The domestication dynamics of WEW are 
still unclear, though several pieces of the puz- 
zle have been identified. In present days, the 
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south-eastern Turkish subpopulations are more 
closely related to DEWs than any other wild 
emmer populations, but the monophyletic ori- 
gin of DEW is still debated. Whole genome 
analysis based on multi-locus assays pointed 
out that the wild emmer populations from the 
Karacadag region west of Diyarbakir and from 
the Sulaimanyia region along the Iraq/Iran bor- 
der appeared the most closely related to DEW 
(Ozkan etal. 2011) with a further molecular 
indication in favor of the Diyarbakir WEW (Luo 
et al. 2007). Later analyses supported the reticu- 
lated origin including sharing phylogenetic sig- 
nals with wild populations from all parts of the 
wild range (Civan et al. 2013). Recently, a study 
from Nave et al. (2021) focused on the Brittle 
Rachis gene (BTR1), a fundamental gene for 
wheat domestication present in the two home- 
ologous copies BTR/-A and BTRI-B. Haplotype 
sequences showed that for the BTRI-A locus, 
the domestic BTRI-A-hapll is highly related 
to the WEW founder haplotype BTRI-A-hap10 
which is ubiquitous in both northern and south- 
ern Fertile Crescent, while for the BTRI-B 
copy, the domesticated haplotype BTR1-B-hap& 
was derived from the wild haplotype BTRI-B- 
hap7 found only in the southern Levant. This 
indicated that at least part of the domestication 
process of WEW occurred outside of the “core 
area" of the northern part of the Fertile Crescent 
(Nave et al. 2021). 

Each WEW race is genetically further sub- 
divided in subpopulations with a pattern that 
mirrors the geographic origin (Luo etal. 
2007; Ozkan etal. 2011; Badaeva etal. 2015; 
Maccaferri et al. 2019). Up to 12 well-distinct 
populations and subpopulations were identified 
by Admixture analysis in the TGC (Maccaferri 
et al. 2019; Fig. 8.4a). Moreover, it was shown 
that populations belonging to the eastern race 
were less diverse than those collected in the 
Levant. Notably, the genetic structure of the 
western population also correlates with differ- 
ences in morphologic features. Indeed, most of 
the western populations belong to the horanum 
botanical variety and include accessions with a 
slender habit, while northern Israel is specifi- 
cally inhabited by the subpopulation judaicum 
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Fig.8.4 Admixture analysis of wild emmer, emmer, and 
landraces included in the TGC, represented as bar plots 
of Q membership coefficients (modified from Maccaferri 
et al. 2019). More in detail results of: a wild emmer 


which includes tall accessions with upright 
habitus, wide spikes with large grains, and more 
fertile than the rest of WEWSs in the western 
area (Poyarkova etal. 1991). Ecological vari- 
ables also play an important role in shaping the 
genetic structure of WEW. Indeed, loci under 
positive selection significantly correlated with 
eco-geographical factors (e.g., geographic loca- 
tion, temperature, water availability, singly or 
in combination) for allele frequency suggesting 
that natural selection could have created regional 
divergence in WEW (Ren et al. 2013). An exam- 
ple of natural selection shaping WEW genetic 
diversity is provided by Yr/5, a broad spectrum 
disease resistance gene cloned in WEW belong- 
ing to the family of tandem kinase-pseudoki- 
nase proteins (Klymiuk etal. 2018). Northern 
regions of Israel show climatic conditions more 
favorable for stripe rust pathogen development 
with respect to the southern regions. A large 
screening of wild emmer natural populations 
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confirmed that Yr/5 gene is present only in 
northern Israeli populations and distributed 
along a narrow mountain ridge of about 100 km 
from Mt. Carmel to Mt. Hermon regions, mainly 
at an elevation higher than 500 m above sea 
level (Klymiuk etal. 2019a; He etal. 2020). 
Thus, it seems that selection pressure exerted by 
the pathogen is affecting the host-parasite inter- 
actions and co-evolution and shapes the distri- 
bution of resistance genes among wild emmer 
populations. 


8.2.2 Emmer Wheat, the First 
Domesticated Wheat 


DEW was a widely cultivated staple crop in the 
Near East, ancient Mesopotamia, and Egypt for 
over 7000 years during the Neolithic period. 
The decline started in Turkey during the Bronze 
Age about 5000 years ago when it was replaced 
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by naked wheats (T. durum and/or T. aestivum), 
while in Europe its cultivation continued until 
about 2000 years ago with a long and slow 
decline. Today, DEW can be found only in mar- 
ginal areas and some isolated traditional farming 
communities in the Balkans and Mediterranean 
countries, Iran, Armenia, Ethiopia, Yemen, 
Oman, and India. Recently, it has been re-dis- 
covered by the organic food industry for bread 
and cookie production. 

Geographical expansion of DEW was inti- 
mately associated with historical human migra- 
tions and spread from the Fertile Crescent with 
a typical star-like dispersal mode. Indeed, four 
major diffusion routes out of the Fertile Crescent 
have been postulated (Badaeva et al. 2015). The 
expansion of DEW was a long and complex pro- 
cess in which emmer genotypes became adapted 
to new habitats and climates. The genetic struc- 
ture of DEW populations was affected, among 
other factors, by exchange of seed stock dur- 
ing migration and by gene flow between wild 
and domesticated wheats or between different 
locally adapted DEW populations. 

Few studies have focused on the genetic 
structure of the DEW germplasm. A clus- 
ter analysis, based on karyotypic information 
on a comprehensive collection of 446 DEW 
lines from 47 countries, identified four groups 
(Balkan, Asian, European, and Ethiopian) that 
allowed the authors to postulate four major 
diffusion routes of the crop out of the Fertile 
Crescent (Badaeva etal. 2015). Notably, 
although specifically evolved in certain geo- 
graphic regions, populations of DEW usually 
included representatives of more than one kary- 
otypic group at different frequencies. This mix- 
ture of karyotypic groups probably originated 
from multiple crop introductions/exchanges by 
successive waves of colonizing civilizations, 
which swept across Europe, the Mediterranean, 
and Asia. This clustering partially agrees with 
the population structure highlighted by Liu et al. 
(2017a, 2017b) on a collection of 176 spring 
accessions representing a large portion of the 
worldwide genetic diversity in the gene pool 


of cultivated spring emmer wheat. Three major 
groups were recognized: an “African subpopu- 
lation” with mostly accessions from Ethiopia, 
Kenya, and Morocco, the “European subpopula- 
tion” with accessions from southern and western 
Europe, and an “Asian subpopulation” grouping 
accessions from eastern and western Asia. About 
DEW, the TGC collection included up to 335 
unique, non-admixed DEWs comprehensively 
sampled from gene banks worldwide. Their 
analysis clearly evidenced the presence of at 
least six well-distinct main populations evolved 
based on the human-driven dispersal along 
the main already described migration routes 
(Maccaferri et al. 2019; Fig. 8.4b). The diversity 
analysis evidenced the already described high 
stratification level consisting of six main popula- 
tions and up to 18 subpopulations corresponding 
to: (1-2) two distinct and ancestral populations 
from the southern Levant, (3) a population close 
relative of the southern Levant populations but 
distinct and evolved in southern Europe, (4) a 
population evolved along the dispersal route 
Turkey-to-Balkans, (5) a distinct population 
evolved along the Turkey-to-Transcaucasia/Iran, 
and (6) an early-separated population evolved 
and spread from Oman-to-India/Ethiopia. 

All these data pointed out the presence of a 
considerable level of diversity naturally evolved 
post-domestication in adaptation to environ- 
ments and well differentiated from the native 
Fertile Crescent (southern Levant and Turkey). 
The unique application of genome-wide associa- 
tion studies (GWASs) reported so far on DEW 
also indicated high genetic diversity. Indeed, the 
same collection showed to be a rich source of 
stripe rust resistance loci very useful for wheat 
improvement (Liu et al. 2017a, 2017b). Among 
the 51 loci for resistance including genes effec- 
tive in multiple field environments or against 
multiple races, a large proportion mapped dis- 
tantly from previously reported stripe rust resist- 
ance genes or QTLs and provide novel resistance 
loci. Notably, African germplasm showed a 
higher frequency of resistant genotypes to stripe 
rust than the other two subpopulations. 
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8.2.3 Variable Human 
and Environmental Pressures 
Have Affected Divergence 
of Durum Wheat Landraces 


Similar to DEW, the southern Levant is the 
center of origin of T. turgidum ssp. durum 
(Vavilov 1951; Feldman 2001). The first evi- 
dence of durum wheat dates ^ 7500—6500 years 
ago (Faris 2014). Then, it spread throughout 
the same migration routes already described for 
DEW through substantially independent path- 
ways with limited evidence of gene flow and/or 
admixture between DEW and DWL (Maccaferri 
et al. 2019). 

The dispersal routes moved durum wheat 
west throughout the Mediterranean Basin up 
to the Iberian Peninsula, probably via trading 
by Phoenician merchants and along the cara- 
van' routes along the Sahara desert or the North 
African coasts (Bozzini 1988), and east through 
the Silk Road to Asia (Waugh 2010). Following 
another early dispersal route to Ethiopia, an 
independent origin of durum wheat by a sepa- 
rate domestication of naked emmer has been 
suggested to have occurred in Ethiopia and have 
originated T: durum ssp. abyssinicum which is 
morphologically different from other durum 
wheat accessions, with uncompact spikes and 
small purple seeds (Mengistu et al. 2015, 2016). 
In addition, natural and anthropogenic selec- 
tion in DEW during human migration resulted 
in the establishment of local DWLs specifically 
adapted to a diversity of agro-ecological zones 
(Nazco et al. 2012). 

Local landraces were progressively aban- 
doned starting from the early 1970s due to their 
replacement with the improved, more productive, 
and genetically uniform semi-dwarf cultivars 
derived from the Green Revolution. This not- 
withstanding, empiric breeding aimed to exploit 
the phenotypic variability of DWLs resulted in 
traditional varieties still preferred by smallholder 
farmers in traditional farming systems of rural/ 
marginal areas where modern intensive ones can- 
not be adopted and/or where this germplasm pro- 
vides the required higher stress tolerance. These 
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traditional DWLs are usually tall plants and are 
often cultivated for both grain and straw, where 
in case yield is too low or even fail due to high 
temperature and drought the straw can still be 
harvested. Thus, crop diversity managed by 
smallholder farmers in traditional agro-systems 
is the outcome of historical and current processes 
interacting at various spatial scales and influ- 
enced by local factors such as farming practices 
and environmental constraints. Due to their evo- 
lutionary dynamics, landraces strongly represent 
the diversity of semiarid and marginal conditions. 
Indeed, evidence supports the hypothesis that 
DWLs harbor the largest source of biodiversity 
within the cultivated durum germplasm, includ- 
ing documented resilience to abiotic stresses and 
resistance to pests and diseases which could be 
used to enrich the modern wheat genetic reper- 
toire for the improvement of commercially valu- 
able traits (Lopes et al. 2015). 

Many studies have focused on panels of lan- 
drace accessions from a restricted country/area, 
as those from southern Italy (Marzario et al. 
2018; Mangini et al. 2018), Iran (Seyedimoradi 
etal. 2016), Spain (Giraldo etal. 2016; Ruiz 
et al. 2012), Tunisia (Robbana et al. 2019; Slim 
et al. 2019; Ouaja et al. 2021), Turkey and Syria 
(Baloch et al. 2017), Palestine, Jordan and Israel 
(Abu-Zaitoun et al. 2018), Morocco (Kehel et al. 
2013; Sahri et al. 2014), and Ethiopia (Mengistu 
et al. 2016; Alemu et al. 2020). 

Different drivers of population restructuring 
have emerged from the analysis of these collec- 
tions. A collection of 91 DWLs originating from 
a wide range of ecological conditions of soil, 
temperature, and water availability in Turkey 
and Syria showed a grouping pattern not associ- 
ated with the geographical distribution of durum 
wheat, suggesting a high mixing of Turkish 
and Syrian landraces due to large exchange of 
genetic material among farmers, as an alterna- 
tive to the lack of commercial varieties (Baloch 
et al. 2017). Higher admixture among landraces 
was also observed in Ethiopia although it is a 
country characterized by a wide range of agro- 
ecological conditions coupled with diverse farm- 
ers' culture. Indeed, both the clustering of 167 
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DWLs by Alemu et al. (2020) and 287 Ethiopian 
DWLs by Mengistu et al. (2016) collected from 
major wheat-growing areas of the country did 
not reflect their geographical origin, suggest- 
ing admixture arose from the existence of his- 
torical seed exchanges involving regional and 
countrywide farming communities in Ethiopia. 
Moreover, Seyedimoradi et al. (2016) reported 
low correlation between genetic distances and 
geographical origin in a small panel of DWLs 
from different zones of Iran. Notably, the genetic 
analysis showed that the country of origin did 
not have any genetic footprint within the core 
collection of DWLs from Jordan, Palestine, and 
Israel, despite historical sociopolitical barri- 
ers present in this area during the last decades 
(Abu-Zaitoun et al. 2018). However, in the latter 
case adaptations to similar semiarid conditions 
might reflect no separation between neighbor- 
ing countries. Conversely, the fingerprinting of a 
collection of the National Gene Bank of Tunisia 
(NGBT) for traditional varieties from differ- 
ent Tunisian agro-ecological zones has found 
a strong genetic stratification. from north to 
south in Tunisia (Slim et al. 2019). Indeed, five 
subpopulations were identified, two of which 
appeared more strongly represented in germ- 
plasm collected in central and southern Tunisia, 
where environmental conditions at critical devel- 
opment phases of the plant are harsher. Notably, 
these subpopulations were underrepresented in 
modern varieties which were instead prevalent 
in the north, suggesting that traits for breed- 
ing more resilient varieties might be present in 
central and southern Tunisian traditional varie- 
ties. In Morocco, a stratification of the genetic 
diversity according to agro-ecological condi- 
tions (geography, but also water and temperature 
regimes) was recorded, related to the two distant 
regions Pre-Rif and Atlas Mountains which dis- 
play very different environmental, cultural, and 
agronomic conditions (Kehel et al. 2013; Sahri 
et al. 2014). However, within each region, only a 
few patterns emerged from the genetic and mor- 
phologic characterization, as if distance does not 
represent a consistent barrier to genetic exchange 
(Sahri etal. 2014). Different hypotheses can 
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explain these results, ranging from unconscious 
mixing by farmers in threshing areas, to unreli- 
ability of seed exchange networks (e.g., seed lots 
that do not correspond to the declared names) as 
well as limited farmers' interest for durum wheat 
cultivation and seed production. These mixtures 
create strong opportunities to generate diversity 
through cross-fertilization and recombination but 
also would homogenize the pool of traditional 
varieties in the absence of human or environ- 
mental divergent pressures that maintain some 
differentiation between them. 

In the TGC collection (Maccaferri et al. 
2019), up to 947 accessions of durum and 
durum-related unique and relatively low in 
admixture were analyzed for population struc- 
ture (Fig. 8.4c). The results showed six main 
populations corresponding to: (1) a local 
Turkey-to-Levant (in particular Syria) popula- 
tion, (2) the main Southern Levant-to-North 
Africa and Southern Europe (Italy, Spain, and 
Portugal) migration route, (3) a highly differen- 
tiated Ethiopian population subdivided into two 
subpopulations, (4) a Turkey-to-Transcaucasia/ 
Russia route, (5) a well-distinct T. turgidum ssp. 
turanicum population developed in Iran/Iraq up 
to Afghanistan, and (6) a localized Greece-to- 
Balkans population including representatives of 
the T. turgidum ssp. turgidum. 

Other comprehensive studies considering 
both molecular and phenotypic data considered 
wide panels of landrace accessions from a larger 
geographic area as the collection of 172 DWLs 
from 21 Mediterranean countries characterized 
by Royo etal. (2014) and Soriano etal. (2016, 
2018). Germplasm from the Mediterranean 
area is of interest for traits relevant for adapta- 
tion to the climate changes since in this region 
wheat is mainly grown under rain-fed conditions 
and yield is often constrained by water and heat 
stress that are common during the grain-filling 
period due to the low and unpredictable sea- 
sonal rainfall. Thus, the above collection high- 
lighted an evident relationship between the 
genetic stratification and the eco-geographic pat- 
terning, which was suggested to be the result of 
different physiological and genetic strategies 
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to sustain yield according to prevalent climate 
conditions. Indeed, the 172 DWLs showed a 
genetic structure related to an eastern—west- 
ern geographical pattern formed by four clearly 
defined groups: eastern Mediterranean, east- 
ern Balkans and Turkey, western Balkans and 
Egypt, and western Mediterranean, in agreement 
with the dispersal pattern of wheat from east to 
west in the Mediterranean Basin (Soriano et al. 
2016). Interestingly, this study also showed a 
reliable relationship between genetic and pheno- 
typic population structures, the latter being based 
on yield, yield components, and crop phenol- 
ogy-related traits. A high number of spikes and 
harvest index were recorded in DWLs from the 
eastern Mediterranean Basin, in agreement with 
the findings of previous studies (Moragues et al. 
2006; Royo et al. 2014) which demonstrated that 
durum wheat yield under warm and dry environ- 
ments is determined mostly by the number of 
spikes per unit area, whereas kernel weight pre- 
dominantly influences grain production in colder 
and wetter environments. Interestingly, using a 
subset of this collection, Soriano etal. (2018) 
identified 23 marker alleles with a differential 
frequency in DWLs from east and west regions 
of the Mediterranean Basin, which affected the 
mentioned agronomic traits. Eastern DWLs 
had higher frequencies than the western ones of 
alleles for increasing the number of spikes (chr. 
1B), grains per m? (chr. 7B), grain-filling duration 
(several marker-trait associations), reduced cycle 
length, and lighter grains (chr. 4A, 5B, and 6B). 


8.2.4 Main Breeding Gene Pools 
Within the DWC Germplasm 


Durum wheat genetic makeup became more 
complex at the beginning of the twentieth cen- 
tury when conscious breeding started by apply- 
ing artificial hybridization and selection pressure 
for commercial purposes (Autrique et al. 1996; 
Pecetti and Annicchiarico 1998; De Vita et al. 
2007). The first durum wheat breeding program, 
setup in southern Italy by Nazareno Strampelli, 
was initially based on the selection of pure 
lines from local landraces (Scarascia Mugnozza 
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2005), which in 1915 led to the release of the 
cultivar CAPPELLI, a pioneer cultivar which 
had a major global impact in the following years 
and to which many modern varieties can be 
traced back to. A second major impact was pro- 
vided by the deployment of lines carrying dwarf- 
ing genes to increase harvest index. This was 
first carried out by Nazareno Strampelli in Italy 
using Rht8 from the variety AKAKOMUGI, 
from Japan, and several years later by Norman 
Borlaug in Mexico using RAt-BIb also from a 
Japanese variety called NORIN10. The dwarfing 
gene Rht-BIb was successfully transferred to the 
durum wheat CANDO in the 1960s and widely 
used in durum wheat breeding (Quick et al. 
1976). The last decades have been characterized 
by several hybridizations occurring between dif- 
ferent breeding programs or with relatives aim- 
ing at increasing productivity while ensuring 
genetic diversity and mega-cultivars that have 
crossed the boundaries of their country of ori- 
gin (Ren et al. 2013). Thus, although Autrique 
et al. (1996) observed that a limited number of 
ancestral lines have contributed largely to the 
development of the modern durum wheat mate- 
rials and that the molecular fingerprints of a few 
ancestors accounted for most of the molecular 
diversity detected in the cultivated gene pool, a 
more complex network has emerged from stud- 
ies on the genetic diversity pattern of the most 
recent durum wheat germplasm. 

Numerous studies reported on charac- 
terization of diverse panels made of DWCs 
(Maccaferri etal. 2005, 2006, 2011; Reimer 
et al. 2008; Condorelli et al. 2018) or related 
to a breeding program (N'Diaye etal. 2018). 
These works have identified a few main gene 
pools reflecting the genetic basis and breed- 
ing strategies involved in their development. 
Maccaferri et al. (2005) subdivided the DWCs 
into six major gene pools: (1) Italian group 
which includes varieties selected and released in 
Italy, 2) CIMMYT-ICARDA group with hall- 
mark accessions derived from the CIMMYT- 
ICARDA breeding program and released 
in Mexico, Spain, Italy, and in several West 
Asia and North Africa countries, (3) French 
group encompassing lines released by French 
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Fig.8.5 Neighbor joining tree of the Global Durum Panel (GDP) collection (modified from Mazzucotelli et al. 2020) 


breeders and well adapted to a range of envi- 
ronments throughout central and south Europe, 
(4) Austrian—Australian group derived from 
Austrian or Australian breeding programs, (5) 
North American group with accessions selected 
in the Great Plains of the USA and Canada, and 
(6) southwestern US group constituted by rep- 
resentative of the germplasm cultivated in the 
southwestern region of the US under irrigation 
and commonly referred to as desert durum. 

More recently, the analysis of wider collec- 
tions, as reported by Kabbaj et al. (2017) and by 
Mazzucotelli et al. (2020), provided the basis for 
separating the two CGIAR breeding programs 
(CIMMYT and ICARDA), as well as defining 
a subgroup of highly admixed varieties derived 
by exchange of materials among different breed- 
ing pools. This high exchange of materials 
was confirmed by the analysis of genetic diver- 
sity on the Global Durum Panel (GDP) clusters 
based on geography and breeding program of 


origin. Indeed, it was shown that most diversity 
remained among individuals within clusters, and 
only 13% of the total genetic variance could be 
captured by groups (Mazzucotelli et al. 2020). 
These insights also indicated that a good level 
of genetic diversity remains available within 
the breeding groups for direct exploitation, and 
there is even greater potential when considering 
exchanges between breeding groups. The 473 
modern cultivars/breeding lines of the GDP were 
grouped into nine distinct groups organized as 
follows: old Italian elite, ICARDA, CIMMYT, 
Spanish/Argentinian elite germplasm, US desert 
durum, Australian elite, and at the opposite of 
diversity, the North American, Canadian, and 
French germplasm, the latter including the 
Evolutive Population (EPO, David et al. 2014) 
(Fig. 8.5). Founders of these modern cultivars 
were identified in western Asian durum lan- 
draces, North African Mediterranean landraces 
and central Asian, Turkey to Transcaucasia/ 
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Russian landraces, indicating that some landraces 
and durum primitive groups did not contribute to 
the genetic makeup of modern cultivars. 
Interestingly, the level of LD decay rate, an 
important feature to be assessed when imple- 
menting GWAS, is quite differentiated com- 
paring modern durum cultivars to landraces 
(Fig. 8.6). On average, LD decays to 7° =0.3 
(a generally accepted reference threshold in 
GWAS) in a range of physical distances of 
4.21 Mb in modern durum, while the range 
decreases to 0.94 Mb in landraces. Thus, the 
landrace germplasm potentially offers higher 
genetic resolution in QTL mapping than mod- 
ern varieties. However, these metrics are highly 
differentiated when referring to pericentromeric 
versus distal chromosome regions, with the 
physical-to-genetic ratio showing differences of 
107-10? magnitude (Maccaferri et al. 2019). 
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Through GWAS on mentioned germplasm 
collections, specific phenotypic traits and/or 
the frequency of alleles at known loci for criti- 
cal adaptation traits (vernalization requirement, 
response to photoperiod, heading date, plant 
height), for disease resistance, and for root mor- 
phology have been interestingly related to the 
population structure. For instance, Maccaferri 
et al. (2011) identified different patterns of allele 
frequency at the three major genes for wheat phe- 
nology and plant architecture (Vrn-A1, Ppd-A1, 
and Rht-B1) across the five subgroups present 
in a collection of elite durum mostly composed 
of Mediterranean germplasm. Most of the 
accessions were vernalization-insensitive and 
semi-dwarf, as expected for élite durum wheat 
materials. The vernalization-sensitive allele vrn- 
Al was present in only six accessions, all but 
one (CLAUDIO) from ICARDA germplasm. 
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Fig.8.6 Genome-wide linkage disequilibrium (LD) 
decay in respect to physical distance in the GDP collec- 
tion for the two main groups of: a modern durum wheat 
germplasm and b durum landraces; critical distances 


are provided to three different r2 threshold values (0.5, 
0.3, and 0.09 as for unliked markers) (Figure from 
Mazzucotelli et al. 2020) 
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Interestingly, in a subgroup made of cultivars 
bred for the semiarid areas, most genotypes car- 
ried the wild-type Rht-Bla allele and an almost 
fixed Ppd-A1 wild-type allele. In addition to con- 
ferring a tall phenotype, the wild Rht allele also 
increases the coleoptile length hence allowing 
for deeper sowing and better exploitation of soil 
moisture, thus making genotypes better suited 
for drought prone areas (Rebetzke et al. 2007). 
As to the Ppd locus, the wild allele confers late- 
ness through photoperiod sensitivity in the 
Mediterranean environments, while the photo- 
period-insensitive alleles dominate in most of the 
modern germplasm accessions. The same collec- 
tion was evaluated for root system architecture, 
with a focus on root growth angle which is con- 
sidered a fundamental trait to enhance the genetic 
capacity of the plant to acquire soil resources 
(Sanguineti et al. 2007; Maccaferri et al. 2016). 
Indeed, a narrow and deep root in contrast to a 
shallow ideotype can contribute to drought resist- 
ance and was found correlated with grain yield 
under harsh rain-fed conditions. Alleles contrib- 
uting a narrow root growth angle were found to 
be present at relatively high frequencies in the 
modern high-yielding germplasm including the 
most recent cultivars from CIMMYT/ICARDA 
programs, the Italian, and the desert durum 
cultivars. The majorroot growth angle QTL 
detected on chromosome 6AL (Maccaferri et al. 
2016) was also reported in a study on Ethiopian 
germplasm (Alemu et al. 2021). 

Resistance to diseases is a relevant trait for 
durum varieties. Notably, the analysis of panels 
made of cultivars has frequently identified the leaf 
rust-resistant gene Lr/4, a locus originally trans- 
ferred from DEW YAROSLAV to common wheat 
(McFadden 1930), then identified in the Chilean 
DWC LLARETA-INIA and in diverse loosely 
related genetic materials, such as the CIMMYT 
line Somateria (Herrera-Foessel et al. 20083), 
the Italian cultivars COLOSSEO  (Maccaferri 
et al. 2008), and CRESO (Marone et al. 2009). 
The resistant haplotype at the Lr/4 locus was 
found in many cultivars from Italian, CIMMYT 
and ICARDA breeding programs suggesting it 
was the most important source of resistance to 
leaf rust exploited by durum breeders. Another 
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interesting example has been provided by the 
breeding for resistance to Hessian fly (Bassi et al. 
2019). A major locus was identified on chromo- 
some 6B in a group of Moroccan DWCs related 
to the cultivar NASSIRA. Pedigree analysis dem- 
onstrated kinship of these lines and traced back 
the origin of the locus to a resistant T: araraticum 
accession which had been used to introgress the 
resistance in locally adapted elite lines. 


8.2.5 Selection Signatures 


An exhaustive genome-wide analysis of changes 
in genetic diversity imposed by thousands 
of years of empirical selection and breeding 
was enabled by the Global Tetraploid Wheat 
Collection consisting of 1856 accessions rep- 
resenting the four main germplasm groups 
involved in tetraploid wheat domestication his- 
tory and breeding (T. dicoccoides, T. dicoccum, 
DWLs, and DWCs) (Maccaferri et al. 2019). For 
each germplasm group, the pattern of diversity 
was assessed through a SNP-based gene diver- 
sity index (Fig. 8.7), then different metrics were 
used to detect selection signatures between evo- 
lutionary transitions, including both diversity 
reduction, and divergence/differentiation of allele 
frequency. WEWs showed the highest average 
diversity with only two pericentromeric regions 
(chr. 2A and 4A) with a lower-than-average 
diversity, thus the authors referred to WEW as 
the reference for assessing the reduction of diver- 
sity associated with domestication and breeding 
in tetraploid wheat. In total, 104 pericentromeric 
(average size 107.7 Mb) and 350 non-pericentro- 
meric (average size 11.4 Mb) genomic regions 
reported co-occurrence of signals of selections in 
one or more evolutionary transitions. 

Compared to WEW, each of the subse- 
quently domesticated/improved germplasm 
group showed several strong diversity deple- 
tions that arose independently and were pro- 
gressively consolidated through domestication 
and breeding. Consequently, the genome of 
DWCs revealed numerous regions showing 
near fixation of allelic diversity. Exceptions 
were observed for chromosomes 2A and 3A in 
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Fig.8.7 SNP-based diversity index (DI) for the main 
germplasm groups identified in the TGC (WEW, DEW, 
DWL, and DWC). DI is reported as a centered 25 SNP- 
based average sliding window (single SNP step). Top and 


the pericentromeric region where the DWCs 
showed an increased diversity as compared to 
DWL and DEW groups. The pericentromeric 
regions showed extensive signals of divergence/ 
selection in the WEW-DEW and WEW-DWL 
transitions which highlights that most of the loss 
of diversity and divergence signatures occurred 
during domestication. The combination of this 
analysis with availability of the durum wheat 
reference genome allowed higher resolution 
analysis for the non-pericentromeric regions 
based on a comparative alignment between 
selection signals and wheat genes and QTLs 
relevant for domestication/improvement. For 
instance, the transition from DEW to durum 
wheat showed different depletion of diver- 
sity related to technological quality improve- 
ments. Indeed, the locus Glu-Al, coding for 
glutenin subunits and located at 500.8 Mb on 
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bottom 2.5% DI quantile distributions are highlighted 
as red- and blue-filled dots, respectively (modified from 
Maccaferri et al. 2019) 


chromosome 1A, which was reported to be 
nearly fixed in modern germplasm for null 
allele, was associated to a local strong signal of 
diversity reduction. Analogously, other extreme 
reductions in diversity were found colocated 
with grain yellow pigment content loci, includ- 
ing Psy-B1. Interestingly, among a set of 41 
previously cloned loci, that have been most 
probably the target of selection, many colocated 
with regions marked by strong selection metrics. 
Intriguing examples were detected at critical loci 
for grain weight, a trait that has been strongly 
modified across the domestication and selec- 
tion history for its relationship with grain yield 
(Fig. 8.8, Desiderio etal. 2019). Co-location 
was found for TaGW2 on chromosome 6A in 
the WEW-to-DEW transition and TaGW2 on 
2B for both WEW-to-DEW and DEW-to-DWL 
transitions. Additionally, TaSus2-A1, TaSdr-A 1, 
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Fig. 8.8 Variation for kernel size and shape in tetraploid wheat (modified from Desiderio et al. 2019) 


and TaCWI-A1 on chromosome 2A and their 
homeologs on 2B were associated to multiple 
extended signals in WEW-to-DEW and in DEW- 
to-DWL transitions, while the durum germplasm 
showed extended regions of low diversity. 


8.3 Tetraploid Wheat Genomes 

The reference genome of the DWC (SVEVO 
v.1, Maccaferri etal. 2019) and of the WEW 
accession ZAVITAN (WEWseq v.1.0, Avni et al. 
2017; assembly reviewed in Zhu etal. 2019 
based on optical mapping) have been sequenced 
in recent years, hence allowing for a better 
exploitation of the genetic diversity of the tetra- 
ploid gene pool. The SVEVO genome sequence 
was assembled in 10.46 Gb including 0.5 Gb of 
unassigned scaffolds. Very similar numbers were 
produced for the genome sequence of ZAVITAN 
with an assembly size of 10.5 Gb including 
0.4 Gb of unassigned scaffolds. The alignment 
of the durum wheat genome with high-density 
SNP genetic maps showed the typical pattern 
of recombination with highly recombinogenic 
distal chromosome regions and large pericentro- 
meric regions nearly devoid of recombination. 


A comparison of the two assemblies revealed 
strong overall synteny with high similarity in 
total gene number (66,559 high confidence genes 
in SVEVO vs. 65,012 in ZAVITAN) and repeti- 
tive element content (82.296 of the total assem- 
bly) and composition (Maccaferri et al. 2019). 
Nevertheless, a comparison of the orthologous 
gene pairs has highlighted several examples of 
presence-absence variations and of copy num- 
ber variations as expected in a context of pange- 
nome analysis where deletions and gene family 
expansions are frequently found (Walkowiak 
et al. 2020). For many years, genomic studies in 
tetraploid wheats were carried out based on the 
genomic resources developed in bread wheat 
thanks to the extensive sequence similarity and 
gene collinearity between the A and B genomes 
of the two species. As an example, the SNPs 
carried on the wheat 90K iSelect Infinium SNP 
assay (Wang et al. 2014), most of which origi- 
nated from bread wheat, have been extensively 
used in genetic diversity and mapping studies in 
tetraploid wheat. 

The availability of genome sequences for 
wild emmer and durum wheat is expected to 
facilitate genomic studies in tetraploid wheats. 
The projection of the 90K iSelect Infinium SNPs 
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to the SVEVO and ZAVITAN genomes allowed 
to map genes and QTLs with higher precision 
and resolution. Moreover, once identified, the 
QTL confidence interval can now be used to 
search for candidate genes directly in the tetra- 
ploid genome. Such approaches have been used 
in mapping studies based on biparental segregat- 
ing populations, as in the case of the identifica- 
tion of the SrKN gene for stem rust resistance 
from the tetraploid DWC KRONOS (Li et al. 
2021); GWAS for different traits (Saccomanno 
et al. 2018; Aoun et al. 2021), and ultimately to 
better refine QTL regions through meta-QTL 
analysis for quality, abiotic and biotic stresses 
in durum wheat (Soriano etal. 2021). The 
molecular characterization of gene families can 
greatly help in the search of candidate genes for 
a particular trait, in studies on gene mapping, 
functional analysis, and comparative genom- 
ics, as in the case of the analysis of Hsp70 and 
glutathione S-transferases (GSTs) genes in dif- 
ferent Triticum subspecies, with implications on 
evolution of these gene families and molecular 
mechanisms of their involvement in response 
to stress (Lai et al. 2021; Hao et al. 2021). The 
approach can also be focused on the characteri- 
zation of a chromosomal locus, as seen for Gli- 
2 locus regions containing o-gliadin genes on 
A and B genomes of WEW (Huo et al. 2019). 
The availability of genome sequences is also of 
particular interest in transcriptomic studies such 
as those based on RNA-seq, in which project- 
ing the reads onto a high-quality genome can 
greatly improve the accuracy and completeness 
of the analysis (Arenas et al. 2022). In a recent 
study, both the analysis of the translatome, the 
collection of all open reading frames that are 
actively translated, and in vivo RNA structure 
profiling were carried out to investigate the com- 
plex wheat RNA structure landscape in durum 
wheat. The translatome revealed subgenome 
asymmetry at the translational level, due to the 
strong impact of mRNA structure on translation, 
independent of GC content (Yang et al. 2021). 
Matching mapping results with informa- 
tion regarding the gene content and annotation 
of genomic regions provides a huge advantage 
for fine mapping and gene/QTL cloning in 
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tetraploid wheat. With the fully assembled 
ZAVITAN genome, the causal mutations in 
Brittle Rachis I (TtBtr1) genes controlling shat- 
tering, a key domestication trait, were identified 
(Avni et al. 2017). More recently, TdHMA3-B1, 
a gene encoding a metal transporter with a 
non-functional variant causing high accumula- 
tion of cadmium in grain, was rapidly cloned 
in SVEVO. Moreover, a wild functional allele, 
characterized by a very low frequency among 
DWCs, was rescued with great advantage for 
durum wheat breeding for cadmium accumula- 
tion in grain (Maccaferri et al. 2019). The utility 
of tetraploid wheat genome has also been shown 
for improvement of resistance to fungal diseases. 
The WEW derived Yr/5, a gene for broad-spec- 
trum resistance to stripe rust, was identified and 
cloned in a large mapping population developed 
by crossing the susceptible durum wheat line 
“D447” with introgression lines carrying Yr/5 
in the genetic background of “D447” (Klymiuk 
etal. 2018). Both ZAVITAN and SVEVO 
genomes were used to clone and functionally 
characterize Pm41, a powdery mildew resist- 
ance gene derived from WEW, which encodes a 
coiled-coil, nucleotide-binding site, and leucine- 
rich repeat protein (CNL) (Li et al. 2020). 

Interestingly, all available wheat genomes are 
important in comparative genomic approaches 
to precisely characterize a chromosomal region 
and its gene content in gene cloning studies. 
This way, tetraploid genomes are instrumental 
for fine mapping and cloning of genes not only 
in emmer or durum wheat, but also in bread 
wheat as in the case of Ne2, a typical CNL gene 
responsible for hybrid necrosis in wheat (Si 
et al. 2021). 

All these data indicate that wheat genomes 
for bread (IWGSC 2018), durum (Maccaferri 
et al. 2019), and wild emmer (Avni et al. 2017) 
wheat once merged in a unique wheat pange- 
nome will provide an excellent asset for genetic 
studies in both tetraploid and hexaploid wheat. 
At the same time, the tetraploid and hexa- 
ploid wheat germplasms could be considered a 
unique gene pool from which to recruit genes 
and alleles useful for breeding in both species 
(Mastrangelo and Cattivelli 2021). 
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8.4 TetraploidGermplasm 
for Bread Wheat Improvements 


(and Vice Versa) 


The increasing demand for food and more sus- 
tainable crops and the increasingly evident 
climatic changes require the selection of new 
wheat DWCs with improved grain yield, pro- 
tein content, and resistance to biotic and abi- 
otic stresses (Foley etal. 2011; Soares etal. 
2019). Achievement of this goal is hindered by 
the limited genetic variability present in wheat 
modern cultivars resulting from multiple domes- 
tication bottlenecks and breeding involving a few 
selected progenitors. This situation prompted 
the attention toward the use of DWLs, non- 
cultivated wheat subspecies, and wild relatives 
to contribute genes conferring traits of inter- 
est and, more in general, to increase the genetic 
diversity of the cultivated gene pool (Reif et al. 
2005). Nowadays, the identification and charac- 
terization of these genes and/or related regions 
(QTLs) are assisted and accelerated by the recent 
technological advances in wheat genomics 
(Tuberosa and Pozniak 2014; Vendramin et al. 
2019; Rasheed and Xia 2019), and their intro- 
gression into elite cultivars can be performed 
both with marker-assisted selection (MAS) and/ 
or with transgenic strategies (Cobb et al. 2019; 
Gadaleta et al. 2008; Mores et al. 2021). 

The close phylogenetic proximity between 
tetraploid and hexaploid wheat allows for the 
transfer of specific genes between the two spe- 
cies, as already occurred during wheat evolu- 
tion (Dubcoksky and Dvorak 2007). Crosses 
between the two species are feasible, overcom- 
ing the problems due to necrosis and low fertil- 
ity of hybrids (Klymiuk et al. 2019b; Othmeni 
et al. 2019). Although the superior adaptability 
of the hexaploid genome has made bread wheat 
the most cultivated wheat worldwide, tetraploid 
wheat exhibits greater genetic diversity, a highly 
desirable feature for present and future wheat 
breeding programs (Mastrangelo and Cattivelli 
2021). WEW is probably the most relevant res- 
ervoirs of genetic diversity for durum and bread 
wheat with durum acting as a bridge between 


the wild relative and the bread wheat to facilitate 
the introgression of WEW traits in modern bread 
wheat lines (Maccaferri et al. 2015; Klymiuk 
et al. 2019b). 


8.4.1 Transfer of Disease Resistance 


Genes 


A number of important genes for biotic stress 
resistance have been transferred into common 
wheat from the primary gene pool of tetraploid 
wheats (T: turgidum ssp. dicoccoides, ssp. dico- 
ccum, and ssp. durum), such as those related 
to the most dreadful and economically impor- 
tant diseases of wheat: rust, namely yellow rust 
(Yr—Puccinia striiformis f. sp. tritici); leaf 
rust (Lr—Puccinia triticina Eriks); stem rust 
(Sr—P. graminis f. sp. tritici), powdery mil- 
dew (Pm—Blumeria graminis f. sp. tritici), and 
Fusarium head blight (FHB; Fusarium gramine- 
arum). The list of rust resistance genes identi- 
fied and transferred from durum to hexaploid 
wheat includes: the yellow rust resistance genes 
Yr53 (chr. 2BL, Xu et al. 2013), Yr64, and Yr65 
(chr. 1BS, Cheng et al. 2014); the leaf rust resist- 
ance genes Lr23 (chr. 2BS, McIntosh et al. 1995; 
Sibikeev etal. 2020), Lr61 (chr. 6BS, Herrera- 
Foessel etal. 2008b), and Lr79 (chr. 3BL, 
Qureshi et al. 2018), and the stem rust resistance 
genes Sr/2 (chr. 3BL, Sheen and Snyder 1964), 
Srl3 (chr. GAL, Simons etal. 2011; Zhang 
et al. 2017), Sr8155B1 (chr. 6AS, Nirmala et al. 
2017), and SrKN (chr. 2BL, Li et al. 2021). 
Other stem rust-resistant genes (S72, Sr/3, and 
Sr14) have been transferred into common wheat 
from cultivated emmer. $r2 (3BS, McIntosh et al. 
1995), a recessive and race non-specific adult 
plant resistance gene, was transferred from the 
emmer variety YAROSLAV into common wheat 
Hope and represents a major success in resist- 
ant wheat breeding which has been deployed in 
many cultivars in the last 80 years and still con- 
fers an effective rust resistance. Sr/4 (chr. 1BL) 
was identified in the cultivated emmer Khapli and 
introgressed into hexploid cultivar Steinwedel. 
Both Sr2 and Srl4 are currently important 
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sources of resistance to Ug99 lineage races of 
stem rust (Singh etal. 2011). Sr/3 (chr. GAL), 
present in both in durum and cultivated emmer, 
was transferred to the common wheat variety 
KHAPSTEIN from the DEW KHAPLI. Sr/3 
confers resistance to all races in the Ug99 group 
(Jin et al. 2007), but the resistant responses are 
influenced by temperature and genetic back- 
ground (McIntosh et al. 1995; Roelfs and Mcvey 
1979). Lr53 and Lr64 mapped on chromosome 
6BS and 6AL, respectively, were transferred 
from WEW to common wheat (Kolmer 2008; 
Dadkhodaie etal. 2011; Huang etal. 2016). 
Several stripe rust resistance genes derived from 
WEW (Yr15 on chr. 1BS, Yr35-6BS, Yr36-6BS, 
YrH52-1BS, and YrSM139-1BS) were mapped 
using T. durum x T. dicoccoides segregating pop- 
ulations and transferred into bread wheat using 
durum wheat asa "bridge" (Peng etal. 2000; 
Dadkhodaie et al. 2011; Hale et al. 2012; Yaniv 
et al. 2015; Zhang et al. 2016). 

Durum wheat has been used as source of 
powdery mildew resistance genes (Mld, Pm3h 
and PmDR147) for bread wheat improvement 
(Miedaner et al. 2019). Mid (chr. 4B, recessive) 
was employed in wheat breeding in combination 
with other Pm resistance genes, such as Pm2 
(chr. SDS, Bennett 1984) and Pm3h (chr. 1AS, 
dominant, Yahiaoui etal. 2006), and prob- 
ably originated from an Ethiopian durum wheat 
accession (Srichumpa etal. 2005). PmDR147 
(chr. 2AL, dominant) was transferred into bread 
wheat cv. LAIZHOU 953 from the durum wheat 
accession DR147 (Zhu et al. 2004). Two pow- 
dery mildew resistance genes, formally named 
PmSa and Pm4a, identified in cultivated emmer, 
were used for bread wheat improvement. Pm5a 
(chr. 7BL, recessive) (McIntosh etal. 1967) 
appeared in the varieties Hope and H-44 along 
with Sr2, while the dominant gene Pm4a (2AL, 
dominant; The etal. 1979) was transferred to 
bread wheat variety chancellor from the Indian 
emmer landrace Khapli (Briggle 1966). WEW is 
a main source of Pm resistance genes—twenty- 
one—for hexaploid wheat (Huang et al. 2016). 
A direct transfer from WEW into bread wheat 
was done for 13 of these, while for the others 
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an identification/mapping after a crossing with 
durum wheat or a validation/mapping in durum 
background, followed by transfer into hexaploid 
wheat, was undertaken (Klymiuk et al. 2019b). 

Even if several FHB resistance regions were 
identified in Triticum turgidum ssp., none pro- 
vides a level of resistance comparable to that of 
Fhb1 (3BS) in bread wheat. Until now, only two 
hard red spring wheat cultivars resistant to FHB, 
STEELE, and REEDER, were developed from 
crosses in which two cultivated emmer acces- 
sions resistant to FHB were involved (Mergoum 
etal. 2005; Stack etal. 2003). Hessian fly 
[Hf—Mayetiola destructor (Say) (Diptera: 
Cecidomyiidae)] is an important pest of durum 
and bread wheat (Stuart et al. 2012). To date, 37 
Hf resistance genes have been identified (Bassi 
etal. 2019; Li etal. 2013; Zhao et al. 2020). 
Among them, 15 (H6, H9-H11, H14-H19, H26, 
and H29 (all on chr. TAS), H31 on chr. 5BS 
and H33 on chr. 3A) were identified in durum 
wheat, and one, Hdic (1AS), was derived from 
an accession of cultivated emmer wheat. Most 
of them, as for example H9-H11 and Hdic, have 
been introgressed into common wheat (Patterson 
et al. 1994; Carlson et al. 1978; Stebbins et al. 
1982; Liu et al. 2005), but only few have been 
deployed in commercial cultivars. 

While all the genes described above were 
identified in tetraploid wheat and then intro- 
gressed into bread wheat, several disease- 
resistant genes moved in the opposite direction. 
Noteworthy is the case of the introgression 
and validation of the bread wheat locus Fhb1 
in three European durum wheat genotypes for 
resistance to Fusarium head blight (FHB) (Prat 
etal. 2017). Indeed, durum wheat is particu- 
larly susceptible to FHB, and limited genetic 
variation has been found so far within durum 
modern germplasm. On the contrary, Fhbl is 
a major determinant of FHB resistance found 
in the hexaploid wheat SUMAI-3 (Anderson 
et al. 2001). Another successful example is pro- 
vided by the common wheat broad resistance 
gene Lr34/Yr18/Sr57/Pm38/Ltn1 that was trans- 
ferred into a Canadian cultivar by transgenesis 
(Rinaldo et al. 2017). 
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8.4.2 Transfer of Quality-Related Loci 


Gpc-Bl (chr.6BS, also known as NAM-B1), 
a gene responsible for high grain protein and 
mineral content, was first identified in WEW 
and cloned by a map-based approach (Uauy 
et al. 2006), then successfully introgressed into 
many durum and bread wheat cultivars (Tabbita 
et al. 2017). While many of the genes described 
above were identified in tetraploid wheat and 
then introgressed into bread wheat, several genes 
moved in the opposite direction. For instance, 
some glutenin loci responsible for gluten elastic- 
ity and extensibility were introgressed into durum 
to improve the quality of bread made with semo- 
lina. Two approaches have been undertaken to 
introgress bread wheat loci into durum wheat. The 
Glu-D1 (chr. IDL) alleles associated with good 
baking quality were introgressed in durum wheat 
either using lines carrying a mutation in Pairing 
homolog-1 (phlb) gene, thus promoting homoe- 
ologous recombination (Gennaro etal. 2012) or 
by standard crosses with triticale as intermediate 
(Lukaszewsky 2003). Similarly, the introgres- 
sion of Glu-DI-1d and Glu-D1-2b from bread 
to durum wheat resulted in dough with stronger 
mixing features (Gadaleta etal. 2008). PhIb- 
mediated chromosomal translocations (5DS-5BS) 
were also employed to produce a tetraploid wheat 
with soft grains by transferring the Hardness 
locus controlling kernel texture from bread wheat 
chromosome 5D (Boehm et al. 2017). 


8.4.3 Transfer of Abiotic Stress- 
Related Loci 
The introgression of specific alleles at 
Vernalization loci from winter bread wheat has 
led to durum wheat more adapted to cold envi- 
ronments (Longin et al. 2013). The tolerance to 
AP* in acidic soils of durum wheats was also 
improved by the transfer of TaMATEIB and 
TaALMTI (chr. 4B and 4D, respectively) which 
confer a large tolerance in AP* tolerance among 
bread wheats (Delhaize etal. 2012; Han etal. 
2016). The TaMATEIB gene, responsible for 
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constitutive citrate efflux from root tips (Tovkach 
et al. 2013), showed a positive effect also on grain 
yield, probably due to an increased root growth 
and proliferation. Indeed, the transgenic durum line 
JANDAROI-7aMATE1B compared to JANDAROI 
per se showed a significantly higher total root bio- 
mass and produced from 25.3 to 49.096 higher 
grain yield under both well-watered and terminal 
drought conditions (Pooniya et al. 2020). 

The other way round, examples of gene trans- 
fer between durum and bread wheat for abiotic 
stress tolerance are more limited due to the 
complex genetic bases of this trait. The great 
genetic diversity present in collections of tetra- 
ploid wheat accessions, from WEWSs to DWCs, 
represents an asset for future efforts aimed at the 
identification of loci explaining a good fraction 
of the phenotypic variation for resistance abiotic 
stresses, to be introgressed into hexaploid wheat. 


8.5 Synthetic Wheats 

Hexaploid wheat (Triticum aestivum L.), sub- 
genomes AABBDD, is a natural amphiploid 
derived from interspecific cross between the 
tetraploid wheat species T. turgidum L. (AABB) 
and the diploid grass Aegilops tauschii Coss. 
(DD). The origin of common wheat is non- 
monophyletic, indeed, at the time of wheat's 
origin 10,000 years ago, the formation of more 
than one interspecific amphiploid contributed to 
the creation of common wheat (Caldwell et al. 
2004). The resulting bottleneck effect has lim- 
ited its genetic diversity compared with tetra- 
ploid and diploid wheats (Cox 1997). For this 
reason, breeders are looking at tools to increase 
the genetic diversity to be exploited in main- 
stream breeding on a worldwide scale. Based 
on those efforts, two distinct approaches were 
deployed: production of amphiploids, known as 
synthetic hexaploids, between T: turgidum and 
Ae. tauschii, and direct hybridization between 
T. aestivum and Ae. tauschii. Both approaches 
involve backcrossing to T: aestivum (Cox et al. 
2017). The direct hybridization approach aims 
at increasing the genetic diversity for the D 
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genome, and this is an important issue in wheat 
breeding, as very low diversity values character- 
ize this genome compared to A and B genomes 
(Poland etal. 2012; Maccaferri et al. 2015). 
Studies reporting the improvement of bread 
wheat for traits related to both agronomic per- 
formance (grain yield and quality) and resist- 
ance to fungal diseases and pests have been 
carried out via direct hybridization (Cox et al. 
2017). Nevertheless, the reduced genetic diver- 
sity following the bottleneck at the origin of 
bread wheat also involves A and B genomes. 
Accordingly, the development of synthetic 
hexaploid wheats (SHWs) allows for enhancing 
the diversity for all the three wheat genomes and 
for the direct transfer of loci for traits of interest 
from tetraploid to hexaploid wheat. The analy- 
sis of the population structure of a panel of 121 
SHW lines genotypically characterized with 
35,939 high-quality SNPs derived from geno- 
typing-by-sequencing revealed that the percent- 
age of SNPs on the D genome was nearly the 
same as the other two genomes (nearly 30%), 
demonstrating the effectiveness of this approach 
to enhance genetic diversity of the D genome 
(Bhatta etal. 2018a). When SHW and bread 
wheat groups were compared at level of the 
entire genome, the gene diversity of SHWs was 
from 33.2 to 50% higher compared with a sam- 
ple of elite bread wheat cultivars in two distinct 
SHW panels (Bhatta et al. 2018a, 2019a). When 
these panels were used in GWAS, QTLs for 
yield and quality-related traits, as well as dis- 
ease resistance, were identified on all the three 
genomes, underlying the importance of both 
durum wheat and Aegilops parents in increasing 
genetic diversity and providing alleles of interest 
for breeding of bread wheat (Bhatta et al. 201 8b, 
c, 2019b). A relevant contribution of the durum 
parents has been revealed also for traits usu- 
ally associated to the D genome. As an exam- 
ple, although tolerance to AP* toxicity has been 
mainly linked to the TaALMTI gene carried by 
the D genome (Han et al. 2016), a GWAS with 
300 SHW lines besides the effect of TaALMTI 
has identified many other QTLs, mostly located 
on A and B genomes (Emebiri etal. 2020). 
Similar results were found in a GWAS with 
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173 SHWS for leaf, stem and yellow rusts, yel- 
low leaf spot, Septoria nodorum, and crown rot 
(Jighly et al. 2016). 

SHW lines have been also used as parents 
of segregating populations to identify QTLs for 
specific traits. In these studies, the SHW parent 
usually incorporates a variable number of loci 
from the Ae. tauschii parent, but some impor- 
tant QTLs are also contributed by the tetraploid 
parent of the SHW line. Some examples are 
available for root traits under drought stress con- 
ditions. Liu et al. (2020) analyzed a RIL popu- 
lation of 111 individuals derived from a bread 
wheat cultivar crossed to a SHW line, which 
incorporated mainly QTLs on the D genome, but 
also some QTLs, as those on chromosome 2B, 
which were probably derived from the tetraploid 
parent. A RIL mapping population derived from 
a cross between W7984 (synthetic) and OPATA 
85 was evaluated for root length and root dry 
weight under water stress and control condi- 
tions. QTLs common to both water conditions 
and stress specific were identified on A, B, and 
D genomes (Ayalew et al. 2017). The same pop- 
ulation together with a doubled haploid popu- 
lation derived from the same parents (W7984 
and OPATA) were used to identify QTLs for 
number of crossover (Gutierrez-Gonzalez et al. 
2019). Similar results, in terms of genetic con- 
tribution from A and B genomes, were found for 
traits related to resistance to stem rust (Dunckel 
etal. 2015; Sharma etal. 2021) and root rot 
(Mahoney et al. 2017). 

Pshenichnikova etal. (2020) developed a 
SHW line from a cross between accessions of T. 
dicoccoides and Ae. tauschii. The resulting line 
(SYN6) was crossed with CHINESE SPRING 
to obtain a set of 21 substitution lines, each 
containing 20 chromosomes from CHINESE 
SPRING and one from SYN6. The 1A substi- 
tution resulted in a substantial reduction of root 
length and weight, while a chromosome 5D 
substitution led to a significant increase com- 
pared to the recipient and the donor lines under 
two contrasting irrigation regimes. Developing 
synthetic lines through interspecific crosses can 
have consequences on the genomic asset of the 
resulting lines. Sequence elimination can happen 
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after allopolyploidization, increasing the diver- 
gence among homoeologous chromosomes. 
This phenomenon has practical consequences 
on bread wheat breeding, as it can cause the loss 
of important genes. A recent study in which the 
differences between synthetic and natural hexa- 
ploid wheat lines were investigated by utilizing 
a large germplasm set of primary synthetics and 
synthetic derivatives revealed that reproduc- 
ible segment elimination occurrence was highly 
dependent on the choice of diploid and tetra- 
ploid parental lines and, that the almost com- 
plete short arm of chromosome 1B carrying loci 
important for grain quality, was eliminated in 
one line (Jighly et al. 2019). In a different study 
in which 1862 mapped loci were compared 
between synthetic wheat SHW-L1 and its paren- 
tal lines Ae. tauschii AS60 (DD) and T: turgidum 
AS2255 (AABB), the D genome of SHW-L1 
showed a higher number of eliminated loci fol- 
lowing the allopolyploidization compared to the 
A and B genomes (Yu et al. 2017). At a pheno- 
typic level, hybrid chlorosis can be observed 
in SHW lines, and genetic loci involved in this 
phenomenon have been identified on D genome 
(Nakano et al. 2015; Nishijima et al. 2018). 
Despite the possibility of losing some loci 
of interest, SHW lines have been extensively 
used in bread wheat breeding. An example is 
given by the importance of SHW lines in breed- 
ing programs at CIMMYT, where more than 
1500 SHWs have been developed since 1980s 
and thousands of crosses have been gener- 
ated with bread wheat to obtain synthetic lines. 
With this approach, advanced lines with excel- 
lent performance for yield and other traits have 
been obtained, and more than 80 have also 
been released as cultivars and are widely grown 
(Rosyara etal. 2019). Some very promising 
lines were obtained with adaptation to specific 
environments. As an example, a large breeding 
program was aimed at developing and evalu- 
ating SHW lines derived from winter durum 
wheat germplasm from Ukraine and Romania 
crossed with Ae. tauschii accessions from the 
Caspian Sea region at CIMMYT. These popu- 
lations, subjected to rigorous pedigree selection 
under dry, cold, disease-affected environments 
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of the Central Anatolian Plateau, provided supe- 
rior lines characterized by resistance to leaf, 
stripe and stem rust, common bunt, and soil- 
borne pathogens, with the contribution of both 
durum and Aegilops parents (Morgounov et al. 
2018). 

The breeding programs involving SHWs 
are also deploying genomic selection. Ninety- 
seven populations were developed using first 
back-cross, biparental, and three-way crosses 
between 33 primary SHW genotypes and 20 
spring bread wheat cultivars at CIMMYT. 
Genomic estimated breeding values (GEBVs) 
of parents and synthetic derived lines were esti- 
mated using a genomic best linear unbiased 
prediction (GBLUP) model, and higher GEBVs 
of progenies were related to introgression and 
retention of positive alleles from SHW par- 
ents (Jafarzadeh etal. 2016). Different results 
were shown by Dunckel et al. (2017), who ana- 
lyzed selected lines from double haploid and 
RIL populations between six different primary 
synthetics and the elite cultivar OPATA M85 
chosen for grain yield and other important agro- 
nomic traits. Overall, the prediction models 
had only moderate predictive ability, slightly 
lower than expected based on traits’ heritabil- 
ity. Nevertheless, a more recent study, based on 
SHW populations and SHW derivatives coming 
from crosses between the primary SHWs and 
bread wheat cultivars, suggested that models 
with heterogeneous additive genetic variances 
may be suitable to predict breeding values in 
wheat crosses with variable ploidy levels (Puhl 
et al. 2021). 

In conclusion, the loss of genetic diversity in 
bread wheat due to bottlenecks from polyploidy, 
domestication, and modern plant breeding can 
be compensated by introducing diversity in all 
the three wheat genomes from Ae. tauschii and 
durum wheat. The increasing use of SHWs and 
SHW derivatives worldwide and at CIMMYT 
indicates the success of these approaches in 
improving bread wheat for many traits of 
interest, from yield and yield-related traits 
to resistance to biotic and/or abiotic stresses. 
Considering the studies based on SHWs, a major 
limiting factor could be the low number of 
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durum wheat genotypes used as tetraploid par- 
ents of the crosses for the development of pri- 
mary SHWs, ALTAR 84 and LANGDON being 
among the most frequently employed. This is in 
contrast with the large number of Ae. tauschii 
accessions used to broaden the genetic diver- 
sity of the D genome. As durum wheat genom- 
1cs comes of age (Tuberosa and Pozniak 2014) 
and haplotype variation linked with target phe- 
notypes of key traits becomes increasingly avail- 
able (Maccaferri et al. 2019; Mazzucotelli et al. 
2020), genomics-assisted breeding for durum 
wheat undergoes a notable evolution, one that 
will be accelerated by widening the basis of the 
tetraploid germplasm harnessed for SHW devel- 
opment by choosing the most suitable parents 
among the most recent and diverse products of 
the durum wheat breeding, hence increasing the 
contribution of the tetraploid gene pool to hexa- 
ploid wheat improvement. 


8.6 Conclusions and Perspectives 

Recent studies and breeding approaches clearly 
indicate that the tetraploid and hexaploid wheats 
are merging in a unique gene pool, where it will 
become increasingly easy to recruit genes and 
alleles for tetraploid and hexaploid wheat breed- 
ing regardless of the genome configuration of 
the donor genotype (Mastrangelo and Cattivelli 
2021). Wild and domesticated emmer wheat 
and T. turgidum subspecies and landraces are 
resources of outstanding importance for breed- 
ing of both durum and bread wheat, and there 
is increasing evidence of élite cultivars carry- 
ing genes recruited from emmer and wheat lan- 
draces. The different wheat genomes will soon 
merge in a unique wheat pangenome that will 
change forever the current breeding strategies. 
Genes will “shuttle” easily from wild acces- 
sions to cultivars and between tetraploid and 
hexaploid wheats making possible the fast intro- 
gression of new traits (e.g., diseases resistance 
and traits for coping with the effect of climate 
change) and the selection of varieties increas- 
ingly based on genome knowledge which will 
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provide a accurate and holistic view of the 
wheat genome, while safeguarding and ensuring 
an adequate level of food security for mankind. 
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Genome-Informed Discovery 
of Genes and Framework 
of Functional Genes in Wheat 


Awais Rasheed, Humaira Qayyum and Rudi Appels 


Abstract 


The complete reference genome of wheat 
was released in 2018 (IWGSC in Science 
361:eaar7191, 2018), and since then many 
wheats genomic resources have been devel- 
oped in a short period of time. These 
resources include resequencing of several 
hundred wheat varieties, exome capture from 
thousands of wheat germplasm lines, large- 
scale RNAseq studies, and complete genome 
sequences with de novo assemblies of 17 
important cultivars. These genomic resources 
provide impetus for accelerated gene discov- 
ery and manipulation of genes for genetic 
improvement in wheat. The groundwork for 
this prospect includes the discovery of more 
than 200 genes using classical gene map- 
ping techniques and comparative genomics 
approaches to explain moderate to major phe- 
notypic variations in wheat. Similarly, QTL 
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repositories are available in wheat which are 
frequently used by wheat genetics researchers 
and breeding communities for reference. The 
current wheat genome annotation is currently 
lagging in pinpointing the already discovered 
genes and QTL, and annotation of such infor- 
mation on the wheat genome sequence can 
significantly improve its value as a reference 
document to be used in wheat breeding. We 
aligned the currently discovered genes to the 
reference genome, provide their position and 
TraesIDs, and present a framework to anno- 
tate such genes in future. 


Keywords 


Wheat genomics : Single nucleotide 
polymorphisms (SNPs) - KASP markers : 
Gene discovery : Functional markers - Gene 
networks 


9.1 Introduction 

Wheat holds a central position among major 
food crops by providing 20% of the total caloric 
requirements for the humans around the world. 
Common wheat (Triticum aestivum L.) is an 
allohexaploid (2n=6x=42; AABBDD) crop 
successfully cultivated all over the world cov- 
ering an area of approximately 220 million ha. 
Genetic improvement in wheat productivity, 
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resilience to climate extremes, and quality are 
challenges to be met in continuing to feed the 
global population, mitigate the effects of climate 
change, and fulfill the end user quality prefer- 
ences. Since the expansion of wheat production 
area will not be possible due to the continu- 
ous shrinking of arable land, the increase in the 
grain yield by improved agronomic practices 
and breeding are feasible approaches. It has 
been recognized that conventional crop breed- 
ing approaches are not able to deliver the tar- 
get of 70% increase in crop productivity by the 
end of 2050 (Tester and Langridge 2010). The 
innovation required in all breeding components 
includes selection accuracy, selection intensity, 
deploying new genetic variations, and shorten- 
ing of the breeding cycles in developing culti- 
vars (Li et al. 2018). 

Conventional plant breeding heavily has 
relied on the selection of key phenotypes related 
to yield-related traits such as harvest index in 
wheat (Lopes et al. 2012), and it seems impos- 
sible to further improve harvest index using 
conventional breeding. Secondly, the pheno- 
typic-based selections are labor intensive and 
time consuming, and off-spring can only be 
selected at the certain homozygous generation at 
the later growth stages. The concept of genom- 
ics-assisted breeding (GAB) was proposed as an 
alternate to overcome the selection challenges 
associated with conventional breeding (Varshney 
et al. 2005). The marker-assisted selection com- 
ponent dominated in the breeding programs 
where the diagnostic markers for the genes 
with major phenotypic effects were developed 
and successfully used for selection (Liu et al. 
2012). However, many complex traits such as 
yield and adaptability to stressed environments 
are controlled by many genes with minor effects 
or quantitative trait loci (QTL), further interact- 
ing with environment (Gao etal. 2015). Their 
individual effects are too small to be efficiently 
captured by one or few markers (Bernardo and 
Yu 2007). Therefore, a transition from marker 
to genome-based breeding is indispensable to 
achieve the productivity targets (Rasheed and 
Xia 2019). 
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The next-generation sequencing (NGS) has 
revolutionized plant genomics and resulted in 
development of techniques and resources ame- 
nable to plant breeding (Bevan et al. 2017). The 
ever-growing plant genomic resources have pro- 
vided plethora of SNP information distributed 
throughout the plant genomes, which have made 
them markers of choice for a variety of research 
applications, especially in breeding and genet- 
ics research. Until now, the reference genome 
sequences are available for most of the crop 
species, including wheat, while pan-genome 
sequences are increasing with the rapid pace 
(see Chap. 14). Characterization of the pan- 
genome can rapidly identify variations within 
the candidate genes, which have a direct appli- 
cation in breeding. In this chapter, we discuss 
different genome-informed scenarios being pur- 
sued to discover genes underpinning important 
phenotypes (Blake et al. 2016). We also provide 
a framework of functional genes of wheat in the 
context of the recent reference genome sequence 
assembly and discuss database resources neces- 
sary to reduce redundancy in research. 


Wheat Reference Genome 
Sequence and Other Genomic 
Resources 


9.2 


The Reference Genome 
Sequence of cv. CHINESE 
SPRING 


9.2.1 


Wheat has a history in being used a model plant 
for understanding cytogenetics, physical map- 
ping of genes, and to facilitate pre-breeding to 
introduce inter-specific and intergenic diver- 
sity. For example, the array of wheat aneuploid 
stocks, unequaled in any other crop, was devel- 
oped by Sears (1954). All these genetic stocks 
were developed using wheat cv. CHINESE 
SPRING (Sears and Sears 1978). Such ane- 
uploids include all the possible chromosome 
addition or deletion lines in the form of nulli- 
somics, trisomics, monosomics, and tetrasom- 
ics. These cytogenetic stocks greatly facilitated 
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the genetic studies which were not possible in 
many of the higher organisms at that time. These 
stocks were used to identify major genes con- 
trolling important traits and physically map their 
positions along chromosomes, including the 
genes related to waxiness, maturity, endosperm 
proteins, and vernalization (Driscoll and Jensen 
1964; Shepherd 1968; Halloran and Boydell 
1967; Law 1966). Later, these efforts provided 
the basis for starting a ‘Catalogue of Gene 
Symbols for Wheat’ to catalogue wheat genes 
(McIntosh 1973). Since a wide array of genetic 
stocks were available in the CHINESE SPRING 
wheat background, this cultivar was selected 
to develop the first reference genome sequence 
in wheat. The International Wheat Genome 
Sequencing Consortium (IWGSC) was estab- 
lished in 2005, and after 13 years of its estab- 
lishment, the high-quality reference sequence 
was released in 2018 (IWGSC 2018). 


9.2.2 Other Genomic Resources 
in Wheat 


All genome sequence resources available in 
wheat to date are provided in Table 9.1 and 
include population-level whole-genome rese- 
quencing, exome sequencing, and to lesser 
extent some SNP genotyping resources. The 
analysis of the CHINESE SPRING reference 
genome is now complemented by de novo 
sequences of ten important wheat cultivars 
from global breeding programs and has allowed 
the documentation of breeding histories, wild 
introgressions in the cultivated wheat, and chro- 
mosomal structural rearrangements that facili- 
tated wheat breeding (Walkowiak et al. 2020); 
Jayakodi et al. 2021). Apart from the sequenc- 
ing efforts in cultivated wheat, the genome 
sequences of diploid and tetraploid progenitors 
of bread wheat including Ae. Tauschii (Zhao 
et al. 2017), T. monococcum (Ling et al. 2018), 
Ae. Speltoides (Avni et al. 2022), and T. dicoc- 
coides (Avni et al. 2017) are available. Recently, 
a population-level genome sequence resource 
of global Ae. Tauschii accessions was provided 
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for use in trait discovery and functional genetic 
validation of D-genome introgressions in bread 
wheat (Gaurav etal. 2022). The shared util- 
ity of all such resources is underpinning the 
assignment of functional attributes to genes 
through association genetics or by selective 
sweeps. For example, 120 Chinese wheat cul- 
tivars and landraces were resequenced, and it 
was identified that the D-subgenome of mod- 
ern cultivars is mostly derived from landraces, 
while A- and B-subgenomes were mainly 
derived from European landraces (Chen et al. 
2019). Strong signals of selective sweeps were 
restricted to 48 high-confidence (HC) genes 
selected during modern wheat breeding. The 
strongest signals were for genes TaNPF6.1-6B, 
TaNAC24, and TaRVE3, which are associated 
with nitrogen use efficiency, drought and heat 
stress tolerance, and flowering time, respectively 
(Chen et al. 2019). 

The exome capture of more than 500 global 
wheat accessions was conducted to identify 
the genes underpinning selection of adapta- 
tion of modern-day bread wheat during last 
10,000 years (Pont etal. 2019). The authors 
concluded that dispersion of wheat and human 
migration patterns were consistent with an ori- 
gin out of the Fertile Crescent and Egypt to 
Maghreb (Northern Africa) with a coastal route. 
The major driving forces in wheat adaptation 
were the vernalization requirement, histori- 
cal groupings, and geographic origins (Europe, 
Asia, Africa, and America) and thus resulted in 
the partitioning of the genetic diversity in wheat. 
Furthermore, a total of 168 Mb of genome 
regions on different chromosomes contained 
selective sweeps which were identical between 
the Asian and European germplasm, even 
though European wheats had more frequent 
introgressions compared to wheats from Eastern 
Asia (He et al. 2019; Zhou et al. 2020), based on 
the resequencing of 890 bread and durum wheat 
accessions and the identification of introgres- 
sions from wild species favoring global wheat 
adaptation. Another globally important genomic 
resource is the DArTseq database of 44,624 
wheat accessions from the International Maize 
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Table 9.1 Wheat genomic resources post-reference genome sequence 


Resource Number of Sequencing strategy Objective 


Pan-genome 


Chinese accessions 


Global landraces and 
cultivated wheats 


Global wheat accessions 


Hexaploid/tetraploid 
accessions 


Chinese wheat 
accessions 


CIMMYT germplasm 


Ae. tauschii global 
collection 


Chinese minicore 
collection 


Elite cultivars of China 


25 wild wheat 
populations 


Aegilops tauschii 


WGS: Whole-genome sequencing; WGR: Whole-genome resequencing 


accessions 


10 


120 


4506 


500 


890 


TIO 


44,624. 


242 


287 


145 


414 


278 


WGS 


WGR 


280 K SNP array 


Exome sequencing 


Exome sequencing 


DArTseq/660 K 


DArTseq 


WGR 


Exome sequencing 


WGR 


WGR 


WGR 


Build a pan- 
genome of wheat 
Identify the selec- 
tion regions during 
wheat breeding 
Wheat phylogeog- 
raphy and genetic 
diversity 

Years of hybridi- 
zation, selection, 
adaptation, and 
plant breeding has 
shaped the genetic 
makeup of modern 
bread wheats 
Identify the wild- 
relative introgres- 
sions favoring 
global wheat 
adaptation 
Dispersion history, 
adaptive evolution, 
and selection of 
wheat in China 
Genomic predict- 
abilities of 35 key 
traits and demon- 
strate the potential 
of genomic 
selection for wheat 
end-use quality 
D-genome 
diversity for gene 
discovery 

Identify genetic 
regions associ- 
ated yield and 
adaptability 
Seventy years of 
breeder-driven 
selection 
Introgression from 
wild populations 
Novel hap- 
lotypes with 
potential applica- 
tions in wheat 
improvement 


Reference 
Walkowiak et al. 
(2020) 

Chen et al. (2019) 


Balfourier et al. (2019) 


Pont et al. (2019) 


He et al. (2019) 


Zhou et al. (2018) 


Juliana et al. (2019) 


Gaurav et al. 2022 


Li et al. (2022a) 


Hao et al. (2020) 


Zhou et al. (2020) 


Zhou et al. (2021) 
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and Wheat Improvement Center (CIMMYT) 
GenBank (Juliana et al. 2019). The DArTseq 
data was used to conduct genome-wide asso- 
ciation studies (GWAS) for 50 different traits of 
breeding interest and identified important loci 
for end-use quality, biotic, and abiotic stress 
resistances. These studies provide a deep insight 
into genetic diversity and genetic regions in 
wheat under artificial and natural selection and 
will keep proving important resources for use of 
such information in breeding. 


9.3 Wheat Functional Genes 
Discovery: Strategies 


and Inventory 


Quantitative trait loci (QTL) mapping and 
GWAS have dominated wheat genomics 
research to date. These studies identify the 
favorable alleles and their diagnostic mark- 
ers which can be then used in wheat breeding 
to introgress important QTL or genes (Rasheed 
and Xia 2019). In Table 9.2, we provide a near- 
to-complete framework of the functional genes 
discovered so far by such approaches. However, 
such genetic dissection especially in case of 
GWAS can be ambiguous due to the confound- 
ing effects of population structure or low-accu- 
racy genotype calls at some loci (Browning and 
Yu 2009), or due to the small population size 
(Finno et al. 2014). It is, therefore, necessary to 
further validate the phenotypic effects of such 
loci in biparental mapping populations or other 
genetic backgrounds, as well as by other bio- 
logical means such as genetic transformation, 
gene silencing or gene knockout, and gene edit- 
ing. The population-level whole-genome rese- 
quencing or exome capture data facilitated the 
discovery of several genes for economically 
important traits. From the resequencing data of 
145 Chinese wheat accessions, Hao et al. (2020) 
identified that ZaFRK2-7A gene contained three 
non-synonymous mutations compared to CS 
allele and was strongly associated with starch 
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and amylose contents in mature seeds. The 
exome sequence of 287 wheat accessions identi- 
fied the causal variations in TaARF/2 encoding 
an auxin response factor and TaDEP encoding 
the G-protein y-subunit, pleiotropically regulat- 
ing both plant height and grain weight in wheat 
(Li et al. 2022a, b). 

In recent years, several loci were identi- 
fied simultaneously by GWAS and biparental 
mapping strategies. Liu et al. (2017) identified 
marker-trait association for black point resist- 
ance. Loci underpinning flour color (Zhai et al. 
2016), kernel number per spike (Shi et al. 2017), 
and thousand grain weight (Sehgal et al. 2020; 
Wang etal. 2021) were also identified fol- 
lowing a similar strategy. A functional gene, 
TaRPPI3LI associated with flour color, was 
identified by GWAS in wheat cultivars from 
China and two KRONOS wheat mutants carry- 
ing premature stop codons of the TaRPPI3LI 
gene and was thus validated as a gene influenc- 
ing flour color (Chen et al. 2019). 

Another gene discovery approach which 
is now widely used is bulk segregation analy- 
sis (BSA), where DNA from individuals of a 
population showing contrasting, extreme, and 
phenotypes is pooled and then RNAseq, exome 
sequencing, or whole-genome resequencing is 
applied (Zou et al. 2016). This is a rapid method 
to identify consistent polymorphic regions 
between contrasting pools of wheat lines. In 
addition to the discovery of SNPs between con- 
trasting pools, differentially expressed genes 
can also be identified in the case of RNAseq 
analysis of tissues. Using this approach, a QTL 
interval with four candidate genes has been 
discovered on chr4A underpinning resistance 
against orange wheat blossom midge (OWBM) 
affecting wheat production in many countries 
(Hao et al. 2019). Likewise, resistance to yellow 
rust in wheat cultivar ZHOUMAI 22 was delim- 
ited to a physical interval of 4 Mb using BSA 
and RNAseq approach (Wang etal. 2017a). 
Other studies where this approach has been 
effective in discovering candidate genes include 
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Table 9.2 Framework of functional genes characterized in wheat with positions in wheat genome and associated 


traits 
Gene 
TaNAATI-A 
TaNAAT2-A 
Glu-Al 
Glu-A3 
TaPYLI-1B 
TOE-BI 
ELF3-B1 
TaFT3-B1 
TaNAATI-B 
TaNAAT2-B 
Glu-B1-717 
Glu-B3 
AGP-L-1B 
Elf3-D1 
Mot-D1 
TaNAATI-D 
TaNAAT2-D 
Glu-D1 
ZDS-A1 
Ppd-Al 
TaVIT1-2A 
TaNASI-A 
TaNAS3-A 
TaNAS9-A 
Sus2-2A 
TaCwi-Al 
TaCYP78A5 
WFZP-AI 
TaGS2-A1 
TaARFI2 
PPO-AI 
Ppo2-Al 
RMD-AI 
TaRSL4 
Sdr-Al 
Ppd-B1 
TaVIT1-2B 
TaNASI-B 
TaNAS3-B 
TaNAS9-B 
TaSus2-2B 
Tabas1 
GNI 
TaDA1-B 
TaGS2-B1 
PPO-B1 
Ppo2-B1 
TaVSR-B1 
RMD-B1 
TaRSL4 
Sdr-B1 
ZDS-D1 
Ppd-D1 


Phenotype 
GFe/GZn 

GFe/GZn 
Gluten/end-use quality 
Gluten/end-use quality 
Drought tolerance 
Flowering time 
Flowering time 
Flowering time 
GFe/GZn 

GFe/GZn 
Gluten/End-use quality 
Gluten/End-use quality 
Grain morphology 
Flowering time 
Flowering time 
GFe/GZn 

GFe/GZn 
Gluten/end-use quality 
Flour color 
Flowering time 

GFe 

GFe/GZn 

GFe/GZn 

GFe/GZn 

Grain morphology 
Grain morphology 
Grain morphology 
Grain number 

NUE 

Plant height 

PPO activity 

PPO activity 

Root growth angle 
Root length 

Seed dormancy/PHS 
Flowering time 

GFe 

GFe/GZn 

GFe/GZn 

GFe/GZn 

Grain morphology 
Grain morphology 
Grain number 

Grain size 

NUE 

PPO activity 

PPO activity 

Root depth 

Root growth angle 
Root length 

Seed dormancy/PHS 
Flour color 


Flowering time 


Crop ontology 

CO_321:0000224 
CO_321:0000224 
CO_321:0000152 
CO_321:0000155 
CO_321:0000131 
CO_321:0000007 
CO_321:0000007 
CO_321:0000007 
CO_321:0000224 
CO_321:0000224 
CO_321:0000153 
CO_321:0000156 
CO_321:0000040 
CO_321:0000007 
CO_321:0000007 
CO_321:0000224 
CO_321:0000224 
CO_321:0000154 
CO_321:0000214 
CO_321:0000007 
CO_321:0000222 
CO_321:0000224 
CO_321:0000224 
CO_321:0000224 
CO_321:0000040 
CO_321:0000040 
CO_321:0000040 
CO_321:0000391 
CO_321:0001671 
CO_321:0000020 
CO_321:0000214 
CO_321:0000214 


CO_321:0000081 
CO_321:0000007 
CO_321:0000222 
CO_321:0000224 
CO_321:0000224 
CO_321:0000224 
CO_321:0000040 
CO_321:0000040 
CO_321:0000391 
CO_321:0000040 
CO_321:0001671 
CO_321:0000214 
CO_321:0000214 


CO_321:0000081 
CO_321:0000214 
CO_321:0000007 


Position 
chr1A:487330367.0.487333932 
chr1A:487463045.0.487466385 
chr1A:508723999.0.508726319 
chr1A:4202215.0.4203588 
chr1B:373628259.0.373629490 
chr1B:59192897.0.59197677 
chr1B:685645287.0.685649392 
chr1B:581413558.0.581414952 
chr1B:520925847.0.520929216 
chr1B:520998902.0.521002315 
chr1B:555765127.0.555766152 
chr1B:568661 1.0.5687693 
chr1B:668129122.0.668132472 
chr1D:493484553.0.493488588 
chr1D:492606158.0.492620025 
chr1D:387796590.0.387800918 
chr1D:387894784.0.387898194 
chr1D:412160786.0.412163311 
chr2A:321150418.0.321156866 
chr2A:36933684.0.36938202 
chr2A:570192811.0.570195203 
chr2A:14976663.0.14978691 
chr2A:19162944.0.19164224 
chr2A:49221108.0.49222130 
chr2A:121141338.0.121145857 
chr2A:508030243.0.508033950 
chr2A:134273284.0.134275604 
chr2A:66848645.0.66849948 
chr2A:729293649.0.729297303 
chr2A:755768802.0.755776624 
chr2A:712187112.0.712189567 
chr2A:712344578.0.712346518 
chr2A:142707925.0.142709726 
chr2A:162291365.0.162292945 
chr2A:158452418.0.158453410 
chrUn:293689186.0.293692375 
chr2B:492146188.0.492148400 
chr2B:23548049.0.23551608 
chr2B:29118956.0.29 120236 
chr2B:72895029.0.72896639 
chr2B:171030429.0.171034964 
chr2B:448904796.0.448907800 
chr2B:573974813.0.573975706 
chr2B:4646554.0.4654607 
chr2B:722629776.0.722634436 
chr2B:688478 142.0.688480649 
chr2B:689764554.0.689766587 
chr2B:89554121.0.89558883 
chr2B:191742224.0.191744048 
chr2B:197210852.0.197212507 
chr2B:200572827.0.200573807 
chr2D:234144711.0.234150925 
chr2D:33952224.0.33955766 


Traes ID 
TraesCS1A02G291100 
TraesCS 1A02G291200 
TraesCS 1A02G317311 
TraesCS 1A02G008000 
TraesCS 1B02G206600 
TraesCS 1B02G076300 
TraesCS 1B02G477400 
TraesCS 1B02G351100 
TraesCS 1B02G300500 
TraesCS 1B02G300600 
TraesCS 1B02G329711 
TraesCS1B02G011700 
TraesCS 1B02G449700 
TraesCS 1D02G451200 
TraesCS 1D02G450200 
TraesCS 1D02G289700 
TraesCS 1D02G289800 
TraesCS 1D02G317211 
TraesCS2A02G238400 
TraesCS2A02G081900 
TraesCS2A02G336600 
TraesCS2A02G033500 
TraesCS2A02G049900 
TraesCS2A02G095700 
TraesCS2A02G168200 
TraesCS2A02G295400 
TraesCS2A02G175700 
TraesCS2A02G1 16900 
TraesCS2A02G500400 
TraesCS2A02G547800 
TraesCS2A02G468200 
TraesCS2A02G468500 
TraesCS2A02G182900 
TraesCS2A02G194200 
TraesCS2A02G191400 
TraesCSU02G196100 
TraesCS2B02G345300 
TraesCS2B02G047100 
TraesCS2B02G060800 
TraesCS2B02G111100 
TraesCS2B02G194200 
TraesCS2B02G313700 
TraesCS2B02G405700 
TraesCS2B02G007700 
TraesCS2B02G528300 
TraesCS2B02G491000 
TraesCS2B02G491400 
TraesCS2B02G122400 
TraesCS2B02G209500 
TraesCS2B02G212700 
TraesCS2B02G215300 
TraesCS2D02G236500 
TraesCS2D02G079600 
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Table 9.2 (continued) 


Gene 
TaVIT1-2D 
TaNASI-D 
TaNAS3-D 
TaNAS9-D 
TaCYP78A5 
WFZP-D1 
TaDAI-D 
TaGS2-D1 
Rht8 
PPO-D1 
Ppo2-D1 
RMD-DI 
TaRSLA 
Lyce-Al 
TaGS5-A1 
Pod-A1 
Tamyb10-A1 
Phs1 
Lyce-B1 
Fhb1_His 
TaNASS-B 
Tamyb10-B1 
Vp1B1 
COMT-3B 
CKX-D1 
Myb10-D1 
ALPb-4A 
PRR73-A1 
TaDMASI-A 
TaNAS6-A 
TaCYP78A5 
TaGSI-A1 
MORI-AI 
Lox-B1 
PRR73-B1 
TaDMAS1-B 
TaNAS6-B 
TaGS1-B1 
Pds-B1 
Rht-Bl 
TaERF73-D1 
MORI-B1 
TaDMASI-D 
TaNAS6-D 
TaD14-4D 
TaGS1-D1 
Rht-D1 
TaERF73-A1 
MORI-DI 
Lr67 
Drol-Al 
Vrn-Ala 
TaNASA4-A 
TaDepl-A1 


Chr 
2D 
2D 
2D 
2D 
2D 
2D 
2D 
2D 
2D 


Phenotype 

GFe 

GFe/GZn 
GFe/GZn 
GFe/GZn 

Grain morphology 
Grain number 
Grain size 

NUE 

Plant height 

PPO activity 

PPO activity 

Root growth angle 
Root length 
End-use quality 
Grain morphology 
POD activity/quality 
Seed color/PHS 
Seed dormancy/PHS 
End-use quality 
FHB resistance 
GFe/GZn 

Seed color/PHS 
Seed dormancy/PHS 
WSC/drought 
Grain morphology 
Seed color/PHS 
End-use quality 
Flowering time 
GFe/GZn 
GFe/GZn 

Grain morphology 
NUE 

Root length 

Flour color 
Flowering time 
GFe/GZn 
GFe/GZn 

NUE 

PDS activity/quality 
Plant height 

Root depth 

Root length 
GFe/GZn 
GFe/GZn 

Grain yield 

NUE 

Plant height 

Root depth 

Root length 

Rust resistance 
Drought tolerance 
Flowering time 
GFe/GZn 

Grain morphology 


Crop ontology 

CO. 321:0000222 
CO. 321:0000224 
CO. 321:0000224 
CO. 321:0000224 
CO. 321:0000040 
CO. 321:0000391 
CO. 321:0000040 
CO. 321:0001671 
CO. 321:0000020 
CO. 321:0000214 
CO. 321:0000214 


CO. 321:0000214 
CO. 321:0000040 
CO. 321:0000214 
CO. 321:0000037 
CO. 321:0000081 
CO. 321:0000214 
CO. 321:0000651 
CO. 321:0000224 
CO. 321:0000037 
CO. 321:0000081 
CO. 321:0000131 
CO. 321:0000040 
CO. 321:0000037 
CO. 321:0000070 
CO. 321:0000007 
CO. 321:0000224 
CO. 321:0000224 
CO. 321:0000040 
CO 321:0001671 


CO. 321:0000214 
CO. 321:0000007 
CO. 321:0000224 
CO. 321:0000224 
CO. 321:0001671 
CO. 321:0000214 
CO. 321:0000020 


CO. 321:0000224 
CO. 321:0000224 
CO. 321:0000013 
CO. 321:0001671 
CO. 321:0000020 


CO. 321:0000131 
CO. 321:0000007 
CO. 321:0000224 
CO. 321:0000040 


Position 
chr2D:419781553.0.419783725 
chr2D:12870350.0.12873858 
chr2D:18168587.0.18170017 
chr2D:45799198.0.45800220 
chr2B:181118653.0.181120839 
chr2D:67496011.0.67496898 
chr2D:8281359.0.8289277 
chr2D:595161545.0.595165983 
chrUn:24893964.0.24897255 
chr2D:572952347.0.572954307 
chr2D:573903210.0.573905141 
chr2D:134790880.0.134792691 
chr2D:138754346.0.138756038 
chr3A:370233784.0.370237786 
chr3A:176555776.0.176559839 
chr3A:730397626.0.730398805 
chr3A:703905707.0.703905910 
chr3A:7294435.0.7297613 
chr3B:377418979.0.377422751 
chr3B:8526628.0.8529572 
chr3B:40773361.0.40778748 
chr3B:757918298.0.757920082 
chr3B:693338001.0.693342761 
chr3B:829391763.0.829392973 
chr3D:106736525.0.106740667 
chr3D:570801243.0.570803210 
chr4A:718033180.0.718034037 
chr4A:119083489.0.119087436 
chr4A:74150821.0.74153009 
Chr4A:148780629.0.148781781 
chr2D:127258537.0.127260686 
chr4A:60668121.0.60671232 
chr4A:685380302.0.685381598 
chr4B:27248262.0.27252524 
chr4B:427491684.0.427496233 
Chr4B:481847465.0.481849531 
Chr4B:402432887.0.402433879 
chr4B:499898695.0.499901767 
chr4B:586575839.0.586580177 
chr4B:30861382.0.30863247 
chr4D:467792044.0.467801204 
chr4B:605691920.0.605693239 
chr4D:392726584.0.392728858 
chr4D:323095782.0.323098145 
chr4D:428116830.0.428119151 
chr4D:403145655.0.403148815 
chr4D:18781062.0.18782933 
chr4A:3351141.0.3352418 
chr4D:478997945.0.478999338 
chr4D:405770870.0.405775112 
chr5A:428994186.0.428997632 
chr5A:587411824.0.587423240 
chr5A:705402044.0.705403372 
chr5A:430486331.0.430493530 


Traes ID 
TraesCS2D02G326300 
TraesCS2D02G033000 
TraesCS2D02G049200 
TraesCS2D02G094200 
TraesCS2B02G201900 
TraesCS2D02G1 18200 
TraesCS2D02G016900 
TraesCS2D02G500600 
TraesCSU02G024900 
TraesCS2D02G468200 
TraesCS2D02G468600 
TraesCS2D02G190700 
TraesCS2D02G193700 
TraesCS3A02G208800 
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TraesCS3A02G212900LC 


TraesCS3A02G5 10600 


TraesCS3A02G631500LC 


TraesCS3A02G006600 
TraesCS3B02G239100 
TraesCS3B02G019900 
TraesCS3B02G068500 
TraesCS3B02G515900 
TraesCS3B02G452200 
TraesCS3B02G612000 
TraesCS3D02G143500 
TraesCS3D02G468400 
TraesCS4A02G453800 
TraesCS4A02G105300 
TraesCS4A02G074800 


TraesCS4A02G127900LC 


TraesCS2D02G183000 
TraesCS4A02G063800 
TraesCS4A02G415400 
TraesCS4B02G037700 
TraesCS4B02G198700 


TraesCS4B02G400500LC 


TraesCS4B02G183900 
TraesCS4B02G240900 
TraesCS4B02G300100 
TraesCS4B02G043100 


TraesCS4D02G406100LC 


TraesCS4B02G316200 
TraesCS4D02G232200 
TraesCS4D02G184900 
TraesCS4D02G258000 
TraesCS4D02G240700 
TraesCS4D02G040400 


TraesCS4A02G003300LC 


TraesCS4D02G3 12800 
TraesCS4D02G243 100 
TraesCS5A02G213300 
TraesCS5A02G391700 
TraesCS5A02G552400 
TraesCS5A02G215100 
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Table 9.2 (continued) 


Gene 
TaGL3.3-5A 
Egt2-Al 
Drol-B1 
Vrn-Blb 
TaDep1-B1 
TaGL3.3-5B 
Egt2-Bl 
Drol-D1 
Vrn-D1 
TaDep1-D1 
TaGL3.3-5D 
Pina-D1 
Pinb-D1 
Egt2-D1 
TaNAS2-A 
TaNAS7-A2 
TaNAS7-A1 
TaGW2-6A 
TaT6P 
SPL21-6A 
NAM-AI 
Kat-2A 
Rht-24 
Rht24 
TaNAS2-B 
TaNAS7-B 
SPL21-6B 
GW2-6B 
NAM-BI 
KAT-2B 
lfehw3 
TaNAS2-D2 
TaNAS2-D1 
TaNAS7-D 
SPL21-6D 
TaGSla 
Moc-Al 
ALPa-7A 
ALPb-7A 
PSY-AI 
TEF-7A 
Sus1-7A1 
TaGW7 
SPL20-7A 
AGP-S-7A 
WAPO-AI 
VRT-A2 
FRK2-7A 


PSY-BI 
TaSus1-7B 
WAPO-B1 
PIN-B2 
TaCOL-B5 


Chr 
5A 
5A 
5B 
5B 
5B 
5B 


Phenotype 

Grain morphology 
Root growth angle 
Drought tolerance 
Flowering time 
Grain morphology 
Grain morphology 
Root growth angle 
Drought tolerance 
Flowering time 
Grain morphology 
Grain morphology 
Grain texture 
Grain texture 
Root growth angle 
GFe/GZn 
GFe/GZn 
GFe/GZn 

Grain morphology 
Grain morphology 
Grain morphology 
Grain protein 
Grain weight 
Plant height 

Plant height 
GFe/GZn 
GFe/GZn 

Grain morphology 
Grain morphology 
Grain protein 
Grain weight 
WSC/Drought 
GFe/GZn 
GFe/GZn 
GFe/GZn 

Grain morphology 
Nitrogen use efficiency 
Agronomic traits/drought 
End-use quality 
End-use quality 
Flour color 

Grain morphology 
Grain morphology 
Grain morphology 
Grain morphology 
Grain morphology 
Grain number 
Grain number 
Starch synthesis/grain 
morphology 

Flour color 

Grain morphology 
Grain number 
Grain texture 


Grain yield 


Crop ontology 
CO_321:0000979 


CO_321:0000131 
CO_321:0000007 
CO_321:0000040 
CO_321:0000979 


CO_321:0000131 
CO_321:0000007 
CO_321:0000040 
CO_321:0000979 
CO_321:0000072 
CO_321:0000072 


CO_321:0000224 
CO_321:0000224 
CO_321:0000224 
CO_321:0000980 
CO_321:0000040 
CO_321:0000040 
CO_321:0000073 
CO_321:0000025 
CO_321:0000020 
CO_321:0000020 
CO_321:0000224 
CO_321:0000224 
CO_321:0000040 
CO_321:0000040 
CO_321:0000073 
CO_321:0000025 
CO_321:0000131 
CO_321:0000224 
CO_321:0000224 
CO_321:0000224 
CO_321:0000040 
CO_321:0001671 
CO_321:0000131 
CO_321:0000070 
CO_321:0000070 
CO_321:0000214 
CO_321:0000040 
CO_321:0000040 
CO_321:0000980 
CO_321:0000040 
CO_321:0000040 
CO_321:0000391 
CO_321:0000391 
CO_321:0001674 


CO_321:0000214 
CO_321:0000040 
CO_321:0000391 
CO_321:0000072 
CO_321:0000013 


Position 
chr5A:26440090.0.26449927 
chr5A:151732800.0.151736140 
chr5B:381041995.0.381044714 
chr5B:573803238.0.573815903 
chr5B:378517204.0.378520796 
chr5B:27830119.0.27840027 
chr5B:304265954.0.304269177 
chr5D:327631371.0.327634216 
chr5D:467176608.0.467184463 
chr5D:326126003.0.326129557 
chr5D:37321983.0.37331860 
chr5D:3591495.0.3592002 
chr5D:3609640.0.3610146 
chr5D:131504758.0.131508027 
chr6A:158316641.0.158317931 
chr6A:603249197.0.603250189 
chr6A:60971892.0.60973259 
chr6A:237734835.0.237759808 
chr6A:461145380.0.461147406 
chr6A:136541506.0.136544204 
chr6A:77098570.0.77100127 
chr6A:606969628.0.606973059 
chr6A:413732327.0.413735532 
chr6A:432253559.0.432257969 
chr6B:212158654.0.212159706 
chr6B:694258986.0.694259978 
chr6B:200509075.0.2005 12019 
chr6B:291761397.0.291778503 
chr6B:134662733.0.134665065 
chr6B:701871007.0.701874630 
chr6B:57283367.0.57288151 
chr6D:121579210.0.121580540 
chr6D:121225536.0.121228339 
chr6D:456540490.0.456541773 
chr6D:111567638.0.111570051 
chr6D:386290812.0.386294394 
chr7A:557553815.0.557555303 
chr7A:15697493.0.15698020 
chr7A:15639003.0.15639854 
chr7A:729397558.0.729401208 
chr7A:66228020.0.66229066 
chr7A:115204109.0.115208145 
chr7A:205459137.0.205465028 
chr7A:685212680.0.685214713 
chr7A:342609326.0.34261711 
chr7A:674081462.0.674082918 
chr7A:128826237.0.128833021 
chr7A:459209231.0.459211266 


chr7B:739442503.0.739445446 
chr7B:68344330.0.68348404 

chr7B:649950255.0.649951851 
chr7B:699388914.0.609389366 
chr7B:667070044.0.667071768 
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Traes ID 
TraesCS5A02G030300 
TraesCS5A02G102000 
TraesCS5B02G210500 
TraesCS5B02G396600 
TraesCS5B02G208700 
TraesCS5B02G029100 
TraesCS5B02G164200 
TraesCS5D02G218700 
TraesCS5D02G401500 
TraesCS5D02G216900 
TraesCS5D02G038500 
TraesCS5D02G004100 
TraesCS5D02G004300 
TraesCS5D02G1 13600 
TraesCS6A02G163100 
TraesCS6A02G386200 
TraesCS6A02G093000 
TraesCS6A02G189300 
TraesCS6A02G248400 
TraesCS6A02G152000 
TraesCS6A02G108300 
TraesCS6A02G392400 
TraesCS6A02G221900 
TraesCS6A02G229500 
TraesCS6B02G186000 
TraesCS6B02G425200 
TraesCS6B02G180300 
TraesCS6B02G215300 
TraesCS6B02G207500LC 
TraesCS6B01G432600 
TraesCS6B02G080700 
TraesCS6D02G148600 
TraesCS6D02G148200 
TraesCS6D02G370800 
TraesCS6D02G142100 
TraesCS6D02G383600LC 
TraesCS7A02G382800 
TraesCS7A02G035500 
TraesCS7A02G035200 
TraesCS7A02G557300 
TraesCS7A02G108900 
TraesCS7A02G158900 
TraesCS7A02G233600 
TraesCS7A02G495000 
TraesCS7A02G287400 
TraesCS7A02G48 1600 
TraesCS7A02G175200 
TraesCS7A02G3 19000 


TraesCS7B02G482000 
TraesCS7B02G063400 
TraesCS7B02G384000 
TraesCS7B02G431200 
TraesCS7B02G400600 


(continued) 


9 Genome-Informed Discovery of Genes and Framework... 


Table 9.2 (continued) 


Gene Chr Phenotype Crop ontology 
PSY-D1 7D Flour color CO. 321:0000214 
Vrn-D3 7D Flowering time CO. 321:0000007 
GS3-DI 7D Grain morphology CO. 321:0000040 
SPL20-7D 7D Grain morphology CO. 321:0000040 
Lr34 7D .Rustresistance 

TaNAS4-D UNK GFe/GZn CO_321:0000224 
TaDAI-A UNK Grain size CO. 321:0000040 
TaERF73-B1 Root depth 


nitrogen-dependent lesion mimic gene Ndhrl] 
(Li et al. 2016), powdery mildew resistance gene 
Pm4b (Wu etal. 2018), leaf senescence gene 
els] (Li et al. 2018), stripe rust resistance gene 
Yr26 (Wu et al. 2018), YrMM56, YrHY1 (Wang 
etal. 2018a, b), dwarfing gene RhAti2 (Sun 
et al. 2019), and Pm6/ (Hu etal. 2019). It is 
likely that this approach will get more attention 
because it replaces the genotyping of complete 
populations (Zou et al. 2016). 

Very few genes in wheat have been discov- 
ered using the traditional map-based cloning 
approach, and most of the genes have been iden- 
tified by comparative genomics between wheat 
and related grass species due to the high col- 
linearity and genetic organization among grass 
genomes (Rasheed and Xia 2019; Chen etal. 
2020). According to the recent literature search, 
almost 33 genes related to grain morphology 
have been isolated by homology-based cloning 
and functional markers have been developed 
for use in breeding (Table 9.1). Likewise, genes 
related to other morphological and phenologi- 
cal traits have been isolated including TaPRR73 
(Zhang etal. 2016) and TaZIM-A/ (Liu et al. 
2018) underpinning flowering time; TaPPH-7A 
(Wang etal. 2018a; b) underpinning morpho- 
logical traits; TaARF4 (Wang etal. 2019b) 
controlling root growth and plant height; and 
TaSnRK2.9-5A (Ur Rehman et al. 2019) control- 
ling drought tolerance. 
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Position Traes ID 
chr7D:636766504.0.636770671 ^ TraesCS7D02G553300 
chr7D:68416507.0.68417532 TraesCS7D02G1 11600 
chr7D:6483394.0.6485745 TraesCS7D02G015000 
chr7D:592816295.0.592819560 TraesCS7D02G482400 
chr7D:47412273.0.47424077 TraesCS7D02G080300 
chrUn:108595828.0.108597155  TraesCSU02G125200 
chrUn:11740231.0.11748045 TraesCSU02G007800 
chr4B:585962983.0.585964402 TraesCS4B02G299500 


9.3.1 Functional Genomics and Map- 


based Cloning in Wheat 


The continuous development of new genomic 
resources in wheat including new reference 
genomes, transcriptome resources, wheat 
TILLING mutants with exome sequencing 
data, and high-density SNP database are con- 
duits for carrying out map-based cloning to 
discover new genes in wheat. A QTL for head 
length and spikelet number was identified and 
then fine mapped to an interval of 0.2 cM (Yao 
etal. 2019). The map-based cloning identi- 
fied that Head Length 2 (HL2) is the designated 
gene controlling head length and spikelet num- 
ber. Zhang etal. (2018) fine mapped a head- 
ing time gene, TaHdm605, in an EMS mutant 
line. Spike architecture is an important yield- 
related attribute, and three genes TaTFL1-2D, 
TaHOX2-2B, and TaAGLGI-5A, controlling 
spike architecture were discovered analyzing 
a large-scale transcriptome data of 90 wheat 
lines (Wang et al. 2017b). The effects of these 
genes were validated by the transgenic assays. 
Another approach used for discovery of gene 
was the screening of a yeast cDNA library con- 
structed from a heat- and drought-tolerant wheat 
cv. HANXUAN 10. Using this approach, TaPR- 
1-1, for tolerance to abiotic stress tolerance, 
was identified which encodes the pathogenesis- 
related (PR) protein family (Wang et al. 20192). 
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The development of male sterile lines is an 
important component of hybrid wheat breeding 
program. Two studies simultaneously cloned 
Male Sterile 2 (Ms2) gene underpinning male 
sterility in wheat (Ni etal. 2017; Xia etal. 
2016). The causal mutation was identified to be 
a terminal-repeat retrotransposon in miniature 
(TRIM) element in the promoter of Ms2. The 
TRIM element was involved in the gene activa- 
tion and causes male sterility. Liu et al. (2019) 
cloned TaSPL8 gene controlling leaf angle and 
is an important component of auxin and brassi- 
nosteroid pathways and associated with cell 
elongation. The knockout mutants of TaSPL& 
had erect leaves due to the loss of the lamina 
joint, compact architecture, and increased spike 
number. Pm21 is a durable disease resistance 
gene derived from Haynaldia villosa confers 
resistance against powdery mildew, and cur- 
rently wheat cultivars with Pm2/ are cultivated 
on 4 m ha in China (Cao et al. 2011). Two com- 
plementary studies cloned Pm2/ and identified 
that it encodes a typical CC-NBS-LRR protein 
involved in broad spectrum resistance to pow- 
dery mildew (He et al. 2019). 

Fusarium head blight (FHB) is one of the 
most important yield and quality limiting factors 
in wheat globally. There are very few resources 
providing durable resistance to FHB in wheat 
including some landraces from China like 
SUMAI 3, which is known to carry Fhb/ gene. 
Rawat etal. (2016) used multiple approaches 
including positional cloning, development of 
overexpression lines, and gene silencing to report 
that a pore-forming toxin-like (PFT) gene was 
the candidate for Fhb/. However, it was later 
found that several FHB susceptible cultivars also 
carry PFT and its candidacy was doubted. Two 
new studies further established that a histidine- 
rich calcium-binding (TaHRC or His) gene adja- 
cent to PFT is the actual Fhb1 and was identified 
as a susceptibility factor (Su et al. 2019). In con- 
trast, Li et al. (2019) concluded that Fhb/ is a 
gain-of-function gene and that the newly gener- 
ated protein acts as a regulator of host immunity. 
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9.3.2 Functional Genes and Their 
Diagnostic Markers 


All the above examples show the discovery 
of genes following different strategies and 
include various validation approaches. Once 
a gene is discovered and its phenotypic effect 
is validated, it becomes important to identify 
and select the favorable alleles of those genes 
in breeding using functional markers (FMs). 
FMs are referred to the PCR-based diagnostic 
markers designed to identify causal polymor- 
phism underpinning phenotypic differences. 
FMs are routinely used in crop breeding pro- 
grams to identify and select the desirable allelic 
variations of specific functional genes (Liu et al. 
2012; Rasheed etal. 2017; Rasheed and Xia 
2019; Rouse et al. 2019). As mentioned earlier, 
FMs due to their high diagnostic value are ideal 
markers for use in breeding to identify and pyra- 
mid different genes in marker-assisted recurrent 
selection. FMs are also used in genomic selec- 
tion to improve selection accuracy. Rasheed 
et al. (2016) converted a collection of 72 FMs 
to kompetitive allele-specific PCR (KASP) for- 
mats for their use in high-throughput platforms. 
This effort currently now includes 157 KASP 
markers to diagnose alleles of traits of breed- 
ing interest. These KASP markers have been 
used by various breeding programs, and a recent 
estimate from citation indicated that currently 
more than 35 wheat breeding and genetic pro- 
grams all over the world used these markers. 
For example, CIMMYT elite lines were tagged 
with TaGS3-D1, TaTGW6, and TaSusl genes 
using these KASP markers (Sehgal et al. 2019). 
Zhao et al. (2019) screened 1152 diverse global 
wheat germplasm lines with KASP markers 
of 47 functional genes underpinning a number 
of important traits of breeding interest (Zhao 
et al. 2019). Favorable alleles of more than 39 
genes of breeding importance were also identi- 
fied in East African wheat germplasm using the 
aforementioned KASP markers (Wamalwa et al. 
2020). 
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Several commercial alternatives to the KASP 
master mix are now available which have made 
SNP genotyping more cost effective. Apart 
from these commercial alternates to the KASP 
technology, some open-source SNP genotyp- 
ing methods are also available. Two examples 
are the development of semi-thermal asymmet- 
ric reverse PCR (STARP) (Long etal. 2017) 
and Amplifluor (Jatayev et al. 2017) methods 
which can be used with wide range of commer- 
cial master mix. Several SNP markers were con- 
verted to STARP format to further reducing the 
cost of genotyping (Wu et al. 2020). 


9.4 Mining Gene Networks Using 


Database Resources 


We have outlined many genome sequencing pro- 
jects carried out to generate genome variation data 
in wheat populations (Table 9.1). The amount 
of genome sequencing data being generated in 
wheat can often hinder scientists from translating 
complex and sometimes contradictory informa- 
tion into biological understanding and discover- 
ies. Apart from using the data to investigate the 
genetic diversity, population-level genomic varia- 
tion data provides a valuable resources and great 
opportunities for identifying trait-related genes, 
designing markers, constructing gene trees, 
exploring the evolutionary history, and assist- 
ing the design of molecular breeding. Mining the 
relevant information from the extensive genome 
variation datasets is a time-consuming and error- 
prone process if the proper tools are not used to 
explore the genes in questions. New tools are 
indispensable to develop for explaining how 
genes and gene networks might be implicated in a 
complex trait or disease. Another limitation is that 
tapping large and complex genome variation data- 
sets requires computational skills exceeding the 
abilities of the most crop breeders. In nutshell, the 
reuse of genomic variation data plays an impor- 
tant role in driving current plant science research. 
We have provided an overview of the various 
genome variation tools and resources for quick 
analysis of gene and gene networks (Table 9.3). 
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9.4.1 Gene-gene Synteny Using 
PRETZEL 


In defining a genetic framework at the genome 
level, the reliance on similarity searches with 
transcripts and proteins is of primary impor- 
tance, and in this context, features of genome 
structure such as sequence/gene repetition 
impact on the capacity to identify the correct 
gene for detailed analysis. Sequence alignments 
underpin all the studies. The capacity to visual- 
ize genome features such as uneven repetition 
between loci aligned between several genomes 
(Fig. 9.1) can anticipate complications when 
gene alignments are carried out without this 
prior knowledge. 

PRETZEL (https://plantinformatics.io; 
Keeble-Gagnere et al. 2019) is an online, inter- 
active, and real-time visualization tool for ana- 
lyzing and integrating genetic and genomic 
datasets. In Fig. 9.1, the alignments of the fruc- 
tosyltransferase genes at the fructan synthe- 
sis locus on 7AS for the wheat cv. LANCER, 
cv. CHINESE SPRING, and cv. MACE are 
shown as a complex example where the IWGSC 
7A-LANCE 7A alignment of the array of GH32 
genes is fully syntenic between gene models 
within the LACER and CS loci. In contrast, the 
IWGSC 7A-MACE 7A alignment is evidently 
ambiguous as a result of small genome rear- 
rangements possibly due to assembly errors. 
The software PRETZEL enables any locus of 
interest to be analyzed and potential issues to be 
identified. 

The variations in fructosyltransferases on 
chromosomes 7A, 4A, 7D, 6A, 6B, and 6D are 
candidate genes in QTL that characterize fructan 
content in wheat grain and thus relate to quality/ 
nutritional attributes of the grain (Zhang etal 
2008; Huynh et al 2012; Langridge and Fleury 
2012). The component fructosyltransferases 
genes in the 4A and 7D loci showed good align- 
ment across LANCE, CS, and MACE except for 
an inversion relative the CS in the MACE locus 
similar to that shown for the 7A locus (Fig. 9.1). 
The 6B and 6D loci carried the component fruc- 
tosyltransferases genes, referred to as fructan 
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Table 9.3 Genomics database in wheat for genome-informed characterization of wheat genes 


Name 
GrainGene 


MASWheat 
expVIP 


WheatExp 


Cerealsdb 


WheatIS 


OpenWildWheat 


IWGSC 
10+ Wheat genomes 
Polymarker 


Triticeae tool box 
Wheat Transcription factors 


TILLING 


WGIN 


URGI 


Gramene 


KnetMiner 


Wheat SnpHub Portal 


Wheat Gmap 


WheatOmics 


WheatGene 


URL 
https://wheat.pw.usda.gov/GG3/ 
http://maswheat.ucdavis.edu/ 
http://wheat-expression.com/ 


https://wheat.pw.usda.gov/WheatExp/ 


http://www.cerealsdb.uk.net/cerealge- 
nomics/CerealsDB/indexNEW.php 
http://wheatis.org/ 
http://www.openwildwheat.org/ 
http://www.wheatgenome.org/ 
http://www. 1Owheatgenomes.com/ 
http://polymarker.tgac.ac.uk/ 
https://triticeaetoolbox.org/wheat/ 


http://itak.feilab.net/ 


http://www. wheat-tilling.com/ 


http://www.wgin.org.uk/about.php 
http://wheat-urgi.versailles.inra.fr/ 


http://www.gramene.org/ 


http://knetminer.rothamsted.ac.uk/ 
Triticum aestivum/ 


http://wheat.cau.edu.cn/ 
Wheat_SnpHub_Portal/ 


https://www.wheatgmap.org/ 


http://wheatomics.sdau.edu.cn/ 


http://wheatgene.agrinome.org 


Referece 
A comprehensive resource for Odell et al. 
molecular and phenotypic infor- (2017) 
mation for Triticeae and Avena 


Description 


Marker-assisted selection data- NA 

base for wheat 

Borrill et al. 
(2016) 
Homoeologue-specific database Pearce et al. 
of gene expression profiles for (2015) 
polyploid wheat 


Wheat transcriptome resources 
for expression analysis 


Database for SNPs, genotyping NA 

arrays and sequences 

Wheat information system NA 

for wheat data, resources and 

bioinformatics tools 

Sequencing resources of Ae. Gaurav et al. 
tauschii accessions (2022) 
Official website of IWGSC NA 

Wheat pan-genome resources NA 
Ramirez- 
Gonzalez et al. 
(2015) 
Repository of wheat data from Blake et al. 
wheat CAP (2016) 
Database of wheat transcription NA 

factors 


SNP assay development tool 


Sequencing resource of Krasileva et al. 
CADENZA (6x) and KRONOS (2017) 
(4x) wheat TILLING population 


Wheat genetic improvement NA 
network 
INRA-based resources for NA 


wheat sequence resources 


Sun et al. 
(2022) 


Open-source, integrated data 
resource for comparative func- 
tional genomics in crops and 
model plant species 
Open-source software tools for Hassani-Pak 
integrating and visualizing large and Rawlings 
biological datasets (2017) 

A web interface to call variation Wang et al. 
data and map allele frequencies (2020) 

in global wheat populations 

based on exome capture and 

resequencing data 

Bulk segregation analysis based Zhang et al. 
on RNA or DNA sequencing (2021) 

data 

Several wheat omics tools 
including blast, ID converter, 
sequence retriever, SNP marker 


Ma et al. (2021) 


Garcia et al. 
(2021) 


A Drupal-based interactive 
genome search database of 
wheat genomes and RNAseq 


(continued) 
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Table 9.3 (continued) 
Name 
ggCOMP 


URL 
http://wheat.cau.edu.cn/ 
WheatCompDB/ 


ccnWHEAT 
ccnWheat 


TGT 


Pretzel https://plantinformatics.io/ 


wheatQTL http://wheatqtldb.net/ 


]-exohydrolase (J-FEH) in Zhang et al (2008), 
and showed good alignment across LANCE, CS, 
and MACE. The 6A locus showed an inversion 
in MACE relative to CS and an absence of the 
locus in LANCER, consistent with the presence/ 
absence polymorphism among wheat varieties 
for the 6A locus reported by Zhang et al. (2008). 

In contrast to the locus carrying the fructo- 
syltransferases, the wheat-APOl (WAPO-AI) 
locus on the long arm of 7A shows unam- 
biguous alignments across the varieties exam- 
ined (Fig.9.2a, left-hand panel for entire 
chromosomes and right panel for the WAPO/ 
locus region), and thus, the variation at the 
structural level that needs to be considered when 
gene functions are examined is not a significant 
factor. Interestingly, the hl and h2 haplotypes 
at this locus (Fig. 9.2b) identified by Voss-Fels 
et al. (2018) using SNP variation in the genome 
sequence indicate striking sequence-level diver- 
gence in this WAPO/ gene region that is not 
reflected at the gene-gene syntenic level shown 
in Fig. 9.2a. 


http://bioinformatics.cau.edu.cn/ 


http://wheat.cau.edu.cn/TGT/ 
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Description Referece 
A wheat resequencing database 
to enable unsupervised identi- 
fication of pairwise germplasm 
resource-based identity by 
descent (gIBD) blocks 

A platform for searching and 
comparing specific functional 
co-expression networks, as 
well as identifying the related 
functions of the genes clustered 
therein 


Yang et al. 
(2022) 


Li et al. (2022b) 


Chen et al. 
(2020) 


A homology database, by inte- 
grating 12 Triticeae genomes 
and three outgroup model 
genomes and implemented ver- 
satile analysis and visualization 
functions 

An interactive, web-based 
environment for navigat- 

ing multi-dimensional wheat 
datasets, including genetic maps 
and chromosome-scale physical 
assemblies 


A QTL database of wheat 


Keeble-Gagnere 
et al. (2019) 


Singh et al. 
(2021) 


The genome viewer in Fig.9.2b is from 
DAWN (Watson-Haigh et al. 2018) and shows 
variation in SNP (colored drops) positions rela- 
tive to the CHINESE SPRING refseq 2.1 as a 
reference and uses cv. LANCER and cv. MACE 
from the wheat 10Xgenome sequence dataset, 
and cv. XIAOYAN 54 and WESTONIA from 
Whole-Genome Shotgun (WGS) resequencing 
data (Watson-Haigh et al. 2018). In field trails, 
under rain-fed conditions, the SNP-based hap- 
lotype h2 was found to be significantly associ- 
ated with increased grain yield compared to hl, 
conferring a 24% yield advantage relative to 
all other haplotypes, especially hl which was 
the other prominent haplotype in the field trial 
(Voss-Fels et al. 2019). 

PRETZEL aims to solve alignment problems 
and structural changes in cultivar sequences 
by providing an interactive, online environ- 
ment for data visualization and analysis which, 
when loaded with appropriately curated data, 
can enable researchers with no bioinformatics 
training to exploit the latest genomic resources 
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Fig.9.1 Comparative analysis of 7AS fructan locus. 
In a, the arrows indicate the location of the locus within 
the entire chromosome, and b and c are the images 
resulting from ZOOMing into the locus. The marker 
genes TraesCS7A02G009100, TraesCS7A02G009200 
through 7raesCS7A02G010200 indicate the array of 
GH32 fructosyltransferases located at the locus in a 
ca 750 kb region (c). Scaffold columns to the right 
side of the PRETZEL maps are important for checking 


cv Chinese Spring (CS) 
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cv Mace 


gene space 


aberrations in colinearity (based on sequence similar- 
ity of 7096 over 7096 of the length of the sequence) as 
discussed in the text in terms of relating the boundaries 
of inverted regions to the boundaries of scaffolds in the 
assembly. In the region illustrated for MACE (b, c), the 
chance of the inverted region being an assembly error is 
reduced because the inversion is well within the respec- 
tive scaffold 


9 Genome-Informed Discovery of Genes and Framework... 


cv Chinese Spring 7A v Mace 7A 


wara ] sr e 3— 
TN | a 
ara seen -| r1 180m 


179 


ev Lancer 7A cv Chinese Spring 7A — cv Mace 7A 


— TraesCS7A026481600 
(wheat APO1) 


TraesC$7A026481700 


if TraesCS7A02G481800 


TraesC$7A026481900 


TraesCS7A02G481600 ===» e 


Lancer- h1 
Xiaoyan 54 -h1 
P7879 Wes 
Mace - h2 


6 064606608080 o > " 
L I 


$8 » be 


Westonia — h2 


Fig.9.2 a PRETZEL view of chr7A region (right 
panel) showing several genes including WAPO-A/ (Voss- 
Fels etal. 2019; Kuzay etal. 2019, 2022) and struc- 
tural changes in the WAPO-A/ gene across three cvs. 
LANCER, CS, and MACE can be visualized with high- 
resolution (right panel). b is the genome viewer from 


(Keeble-Gagnére et al. 2019). Apart from the 
visualization, PRETZEL can be used to retrieve 
the genome information (features including 
markers, genes, annotations, etc.) as dataset files 
of any selected chromosomal region for further 
downstream analysis. 


9.4.2 Knowledge Graphs 


Knowledge graphs (KG) are now extensively 
used to make search and information dis- 
covery more efficient. Knetminer is a data 
integration platform to visualize biological 
knowledge networks in an interactive web 
application (Hassani-Pak and Rawlings 2017). 
The data integration approach to build KGs has 
the ability to capture complex biological rela- 
tionships between genes, traits, diseases, and 
many more information types derived from 
curated or predicted information sources. For 
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DAWN (Watson-Haigh et al. 2018), and shows variation 
relative to the CHINESE SPRING refseq 2.1 as a ref- 
erence and uses cv. LANCER and cv. MACE from the 
wheat 10Xgenome sequence dataset, and cv. XIAOYAN 
54 and WESTONIA from Whole-Genome Shotgun 
(WGS) resequencing data 


example, Rht24 is a new gene discovered asso- 
ciated with semi-dwarf phenotype in wheat 
and is present on chr6A. The Knetminer iden- 
tified the gene network of Rht24, partially 
shown as Fig. 9.3 for clarity. The Traes IDs of 
both of the chr6B and chr6D homeologue are 
shown as interacting genes, and another gene, 
TraesCS5B02G265400, strongly interacts with 
Rht24. It can also be visualized that the gene 
interacts with bHLH27 transcription factor and 
physiologically influences the Gibberellin 20 
pathway. Another feature is the identification 
of any stop/gain mutations in the CADENZA 
TILLING population, and mutant names and 
SNP positions can also be visualized. 

The causal mutation of Rht24 on chr6A was 
identified in the exome capture data of the global 
hexaploid wheat collection (He etal. 2019). 
The target SNP was plotted for the frequency of 
wild-type and alternate SNP among global wheat 
accessions using SnpHub portal (Fig. 9.4). 
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Fig.9.3 KnetMiner network depicts connec- and the mutations in the wheat TILLING population 
tions with Rht-24 on chr6A in wheat. This wheat (e.g. two mutations in CADENZA TILLING popula- 


reduced height gene, Rht-24, its homeologs on B- and 
D-genome along with other genes in cross-talk like 
TraesCS5B02G265400, associated transcription factors, 


tion) can be visualized. Not all connections present in 
the KnetMiner network are depicted in the figure; only a 
subset is shown for clarity 


Haplotype map 


Latitude 


Fig. 9.4 SnpHub-based global haplotype map of non- 
synonymous mutation in Rht-24 is plotted based on the 
global exome sequencing data. In pie chart, the red pro- 
portion represents the frequency of wild-type mutation, 


Longitude 


while the blue proportion represents the frequency of 
non-synonymous mutation associated with reduced 
height 
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9.4.3 SnpHub Portal for Global 
Overview of Functional Gene 
Frequencies 


SnpHub portal is a convenient way to identify 
mutations in the wheat genomes and then plot- 
ting the frequency of the SNPs country-wise in 
global what population (Wang et al. 2020). It is 
a Shine/R-based platform for mining and visu- 
alizing large genome variation data in wheat. 
Genome variation data in terms of .vcf files and 
genome annotation files can be accessed by a 
chromosomal interval of specific gene (Traes 
ID) to visualize genomic variation in heatmap, 
phylogenetic trees, haplotype networks, and 
haplotype geographic maps. 

Apart from these platforms, several other 
platforms can be interactively used to mine 
useful genome variation and gene expression 
analysis (Table 9.3). The exVIP is an excel- 
lent resource for gene expression studies across 
various tissues and various experiments where 
the expression of certain genes can be visu- 
alized as heatmaps or as datafiles for further 
analysis. Similarly, WheatOmics (Ma etal. 
2021) provides several features for analysis of 
genes including JBrowse with distinct track of 
several SNP genotyping and exome sequenc- 
ing resources, TraesID converter, and sequence 
retriever. Last but not least, a wheat QTL data- 
base has been released recently which is an 
important resource to align QTL information 
with the IWGSC reference sequence (Singh 
et al. 2021). 


95 Conclusion and Prospects 

The complete annotation of functional genes 
in wheat is a challenge at multiple levels. For 
example, a first important intrinsic feature to 
impact annotation is the fragmentation level at 
the level of the number of exons per gene. As a 
CDS is fragmented into several exons, the diffi- 
culty to predict the correct intron/exon structure 
increases. In a detailed analysis of the wheat 
genome space by Choulet etal., (this volume, 
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Chap. 4) it was emphasized that an important 
intrinsic feature of eukaryote gene structure 
that impacts on annotation is the fragmentation 
level at the level of the number of exons per 
gene. Choulet et al., (Chap. 4) noted that as a 
CDS is fragmented into several exons, the dif- 
ficulty in predicting the correct intron/exon 
structure increases, although in wheat, (RefSeq 
Annotation v2.1) the average number of exons 
per CDS is only 4, and some genes (up to 1096) 
can have up to 17 exons. In this chapter, we 
have assigned genes and QTL to the reference 
genome and utilized available annotations to 
significantly improve the value of the outputs 
as reference documentation to be used in wheat 
breeding. The alignment of traits to annotated 
genes in the reference genome provides their 
position and 7raesIDs to define a framework 
for establishing more informative markers for 
selecting lines to be deployed in crosses as well 
as for tracking targeted traits in segregating 
progeny from crosses. 

Integration of a range of datasets has been 
emphasized in this chapter in order to deal with 
the complexity of the wheat genome and gen- 
erating robust associations between genome 
haplotypes and agronomic traits for select- 
ing parents for crossing and accurately track- 
ing progeny from crosses. Since only 1796 of 
genes are single copies, most key agronomic 
traits are likely to be the product of gene net- 
work interactions involving genes/gene families 
distributed across the chromosomes of the A-, 
B-, and D-subgenomes and genome signatures 
(haplotypes). 

The sequencing data generated from culti- 
vated and wild wheats, natural and breeding 
populations, and mutants is enabling the dis- 
covery of genes underpinning important traits 
of breeding interest. This information is use- 
ful to further develop and deploy the diagnostic 
markers for use in wheat breeding. The wheat 
genome variation is very complex for down- 
stream analysis; therefore, the data analytics 
platforms have been developed to visualize 
genome variations and expression in heatmaps, 
haplotype and geographic maps, and gene 
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networks. We have provided an elucidated of 
gene frameworks discovered so far in wheat, and 
these need to be integrated with the thousands 
of QTL that have been discovered in wheat in 
different mapping populations and with many 
different marker platforms. The integration of 
wheat QTL information with genome visualiza- 
tion platforms for better understanding of gene 
networks and trait discovery is a key challenge. 
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Rapid Cloning of Disease 
Resistance Genes in Wheat 


Katherine L. D. Running and Justin D. Faris 


Abstract 


Wheat is challenged by rapidly evolving 
pathogen populations, resulting in yield 
losses. Plants use innate immune systems 
involving the recognition of pathogen effec- 
tors and subsequent activation of defense 
responses to respond to pathogen infections. 
Understanding the genes, genetic networks, 
and mechanisms governing plant-pathogen 
interactions is key to the development of vari- 
eties with robust resistance whether through 
conventional breeding techniques coupled 
with marker selection, gene editing, or other 
novel strategies. With regards to plant-path- 
ogen interactions, the most useful targets 
for crop improvement are the plant genes 
responsible for pathogen effector recognition, 
referred to as resistance (R) or susceptibility 
(S) genes, because they govern the plant’s 
defense response. Historically, the molecu- 
lar identification of R/S genes in wheat has 
been extremely difficult due to the large 
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and repetitive nature of the wheat genome. 
However, recent advances in gene cloning 
methods that exploit reduced representa- 
tion sequencing methods to reduce genome 
complexity have greatly expedited R/S gene 
cloning in wheat. Such rapid cloning meth- 
ods referred to as MutRenSeq, AgRenSeq, 
k-mer GWAS, and MutChromSeq allow the 
identification of candidate genes without the 
development and screening of high-resolu- 
tion mapping populations, which is a highly 
laborious step often required in traditional 
positional cloning methods. These new clon- 
ing methods can now be coupled with a wide 
range of wheat genome assemblies, addi- 
tional genomic resources such as TILLING 
populations, and advances in bioinformatics 
and data analysis, to revolutionize the gene 
cloning landscape for wheat. Today, 58 R/S 
genes have been identified with 42 of them 
having been identified in the past six years 
alone. Thus, wheat researchers now have 
the means to enhance global food security 
through the discovery of R/S genes, paving 
the way for rapid R gene deployment or S 
gene elimination, manipulation through gene 
editing, and understanding wheat-pathogen 
interactions at the molecular level to guard 
against crop losses due to pathogens. 
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10.1 Introduction 

Pathogens and pests pose a significant threat to 
global food security, affecting not Just primary 
yields, but also the stability and distribution 
of production and the quality of food (Savary 
etal. 2017). An estimated 21.47% of global 
wheat yields are lost annually due to pathogens 
and pests (Savary et al. 2019), equating to~210 
million metric tons of grain per year, enough 
to bake 290 billion loaves of bread (Wulff and 
Krattinger 2022). Combining agronomic prac- 
tices that reduce the initial disease inoculum 
and infection rate with selection of genetically 
resistant varieties is an effective crop disease 
management strategy, and to develop geneti- 
cally resistant wheat, resistance (R) genes need 
to be identified, characterized, and deployed. In 
some diseases, for example tan spot or septoria 
nodorum blotch, susceptibility is conferred by 
dominant genes. In these cases, the priority is to 
remove or disrupt susceptibility (S) genes rather 
than deploy novel R genes. Gene cloning is cru- 
cial to the efficient deployment of R genes and 
removal of S genes, requiring the identification 
of the nucleotide sequence of a target gene and 
validating its function. Diversity and functional 
studies can assess the effects of genetic vari- 
ation within an R or S gene on their respective 
resistance/susceptibility, allowing researchers to 
develop molecular markers targeting the vari- 
ants, which can then be used to select breeding 
lines with the most beneficial alleles. Cloned R 
genes can also be introduced into modern culti- 
vars via gene complementation or cross-hybrid- 
ization, and S genes can be removed through 
marker-assisted elimination or gene editing. The 
methods and resources used to clone R and S 
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genes are shared, and as such R and/or S genes 
will be referred to as “R/S genes" in this chapter. 
Although over 460 R/S genes in wheat have 
been described (Hafeez etal. 2021), only 58 
have been cloned (Table 10.1). The genome of 
hexaploid bread wheat is large and repetitive 
due, in part, to its evolutionary history, mak- 
ing it challenging to clone R/S genes. The basic 
seven-chromosome Triticeae progenitor split into 
the Triticum and Aegilops branches about 3 mil- 
lion years ago (MYA) (reviewed by Faris 2014). 
Modern-day bread wheat (Triticum | aestivum 
ssp. aestivum L., 2n=6x=42, AABBDD) is 
an allohexaploid that evolved as a result of two 
amphiploidization events involving the hybridi- 
zation of two different species followed by spon- 
taneous chromosome doubling through meiotic 
restitution division, several mutations, and inter- 
specific gene flow. Around 0.5 MYA the wild 
diploid species T. urartu Tumanian ex Gandylian 
(2n=2x= 14, AA) hybridized with a species sim- 
ilar to Aegilops speltoides Tausch (2n — 2x— 14, 
SS) to form tetraploid wheat Triticum turgidum 
ssp. dicoccoides Thell (2n=4x=28, AABB), 
also known as wild emmer. T. turgidum ssp. 
durum (2n=4x=28, AABB), durum wheat, is 
a free-threshing derivative of T. turgidum ssp. 
dicoccoides, and it is today widely cultivated and 
used to make pasta and other semolina-based 
products. The second amphiploidization event 
occurred around 8000 years ago. A T. turgidum 
ssp. and the diploid wild goat grass Aegilops 
tauschii Coss. (2n=2x=14, DD) hybridized 
to form hexaploid (common or bread) wheat T. 
aestivum (2n— 6x —42, AABBDD). Due to the 
differential presence of Ae. tauschii lineage spe- 
cific sequences in modern cultivars, it is possible 
that more than one hybridization even occurred 
between T. turgidum spp. and Ae. tauschii 
(Gaurav et al. 2022). Together, bread and durum 
wheat provide about 1896 of the caloric intake 
of humans worldwide, but in some regions of 
the world, wheat accounts for over a third of the 
caloric and protein intake (Erenstein et al. 2022). 
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Despite their polyploid nature, bread and 
durum wheat behave like diploid plants geneti- 
cally, with homologous chromosomes pairing 
and segregating during meiosis. The pairing of 
homoeologous chromosomes is prevented by 
genes PhI and Ph2 (Riley and Chapman 1958; 
Sears and Okamoto 1958; Mello-Sampayo and 
Lorente 1968) with the resulting diploid-like 
pairing of wheat chromosomes in meiosis sim- 
plifying segregation studies and genetic map- 
ping of traits. Due to their formation through 
amphiploidization, hexaploid and tetraploid 
wheats often have three or two copies of each 
gene, respectively, called homoeologous genes. 
Homoeologous genes are often highly con- 
served, with~97% identity across their coding 
regions (Schreiber et al. 2012), and this high 
sequence conservation among homoeologous 
genes hinders the development of homoeolog- 
specific molecular markers. Additionally, 
approximately 85% of the wheat genome is 
comprised repetitive elements (Wicker et al. 
2018), making it difficult to design molecular 
markers that only target one locus for use in 
molecular mapping or marker-assisted selection. 

Bread and durum wheat genomes are rela- 
tively large at 12 and 17Gb, respectively 
(Bennett and Smith 1976). The sequencing 
and assembly of such large genomes are com- 
putationally challenging and further compli- 
cated by the highly repetitive nature of wheat 
genomes and interchromosomal gene duplica- 
tions (IWGSC etal. 2014). The complexity of 
the wheat genome has hampered the generation 
of genomic data and bioinformatic analysis. 
Despite the challenges, multiple high-quality 
genome assemblies have been constructed 
(Table 10.2). Genome assemblies are used to 
design molecular markers and bait libraries, 
assess candidate genes, and evaluate structural 
variation as well as acting as a foundation for 
developing genomic resources and tools that aid 
in the cloning of R/S genes. 

The first cloned S and R genes in wheat, 
TaMlo-BI and Lr10, were published in 2002 
(Elliott et al. 2002) and 2003 (Feuillet et al. 
2003), respectively. Since then, 48 more R/S 
genes have been cloned from Triticum or 
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Aegilops species, and an additional eight R/S 
genes have been cloned from related species 
and shown to be functional in wheat (Table 
10.1, current as of 8/1/2022). In just the last two 
years, more R/S genes were cloned than were 
cloned in the first decade of R/S gene cloning. 
Here, we review the surge of genomic resources 
and gene cloning methods that have contributed 
to the acceleration of R/S gene cloning in wheat. 


10.22 Advances in Wheat Genome 
Sequencing 


High-quality genomic sequences and assem- 
blies act as the basis for gene cloning efforts in 
wheat, and the recognition of this requirement 
led to the formation of the International Wheat 
Genome Sequencing Consortium (IWGSC) in 
2005. Several hexaploid, tetraploid, and dip- 
loid Triticum full genome assemblies have been 
released in the last five years (Table 10.2). The 
bread wheat variety CHINESE SPRING was 
selected for sequencing due to the extensive 
genetic and molecular resources developed 
using this variety (Gill etal. 2004), includ- 
ing aneuploid stocks developed by Ernie Sears 
that could be used to physically map genes and 
markers to specific chromosomes (Sears 1954, 
1966; Sears and Sears 1978). Segmental dele- 
tion lines (Endo and Gill 1996) further specified 
physical regions within chromosomal arms and 
were used to map 16,000 expressed sequence 
tag (EST) loci (Qi et al. 2004). 

Hexaploid wheat was estimated to be 17 Gb 
and included families of DNA sequences that 
were highly repetitive (Bennett and Smith 1976). 
A reduced-representation sequencing approach 
was used to reduce the genome complexity 
and size (IWGSC et al. 2014), making use of 
CHINESE SPRING ditelosomic stocks devel- 
oped by Sears and Sears (1978) to isolate each 
chromosome arm by flow cytometry, and BAC 
libraries were subsequently constructed from the 
DNA of individual arms. The bin-mapped ESTs 
were used to assess the purity of the sorted chro- 
mosome fractions (Qi et al. 2004). Short read 
paired-end sequences of each BAC library were 
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Table 10.2 Triticum and Aegilops assemblies 


Species 
Ae. tauschii 


T. urartu 

T. turgidum ssp. 
durum 

T. aestivum 

T. aestivum 

Ae. speltoides 
T. turgidum ssp. 


durum 


Synthetic 
hexaploid 


T. aestivum 
Ae. tauschii 
Ae. tauschii 
Ae. tauschii 
T. aestivum 


T. turgidum ssp. 
durum 


T. aestivum ssp. 
dicoccoides 

T. aestivum 

T. urartu 

T. turgidum ssp. 


durum 
T. aestivum ssp. 


dicoccoides 
T. aestivum 
T. aestivum 
T. aestivum 
T. aestivum 


T. aestivum 


T. aestivum 


Genotype 
AL8/78 


G1812/P1428198 
CAPPELLI 
CHINESE SPRING 
CHINESE SPRING 
ERX391140 
STRONGFIELD 
W7984 


CHINESE SPRING 
doubled haploid 
(Dv418) 

AL8/78 

AL8/78 

AL8.78 

CHINESE SPRING 


KRONOS 


ZAVITAN 
CHINESE SPRING 
G1812/P1428198 
SVEVO 

ZAVITAN 

2670/PI 190962 
ARINA-LRFOR 
CADENZA 

CDC LANDMARK 
CDC STANLEY 


CLAIRE 


Year 
2013 


2013 


2014 


2014 


2014 


2014 


2014 


2015 


2017 


2017 


2017 


2017 


2017 


2017 


2017 


2018 


2018 


2019 


2019 


2020 


2020 


2020 


2020 


2020 


2020 


Genomes Type 


D Scaffold 

A Scaffold 

AB Scaffold 

B Pseudomolecule 
ABD Scaffold 

SS Scaffold 

AB Scaffold 

ABD Scaffold 

ABD Scaffold 

D Pseudomolecule 
D Pseudomolecule 
D Scaffold 

ABD Scaffold 

AB Scaffold 

AB Pseudomolecule 
ABD 

A Pseudomolecule 
AB Pseudomolecule 
AB Pseudomolecule 
ABD 

ABD 

ABD Scaffold 

ABD 

ABD 

ABD Scaffold 


Pseudomolecule 


Pseudomolecule 


Pseudomolecule 


Pseudomolecule 


Pseudomolecule 


Reference 
Jia et al. 
(2013) 

Ling et al. 
(2013) 
IWGSC et 
al. (2014) 
Choulet 

et al. (2014) 
IWGSC et 
al. (2014) 
IWGSC et 
al. (2014) 
IWGSC et 
al. (2014) 
Chapman 

et al. (2015) 


Zimin et al. 
(2017a) 


Luo et al. 
(2017) 
Zhao et al. 
(2017) 
Zimin et al. 
(2017b) 


Clavijo et al. 


(2017) 
N/A 


Avni et al. 
(2017) 
IWGSC 

et al. (2018) 
Ling et al., 
(2018) 
Maccaferri 
et al. (2019) 
Zhu et al. 
(2019) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
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Doi or link 
https://doi.org/10.1038/ 
nature 12028 
https://doi.org/10.1038/ 
nature1 1997 
https://doi.org/10.1126/ 
science.1251788 
https://doi.org/10.1126/ 
science. 1249721 
https://doi.org/10.1126/ 
science. 1251788 
https://doi.org/10.1126/ 
science. 1251788 
https://doi.org/10.1126/ 
science. 1251788 
https://doi.org/10.1186/ 
s13059-015-0582-8 
https://doi.org/10.1093/ 
gigascience/gix097 


https://doi.org/10.1038/ 
nature24486 
https://doi.org/10.1038/ 
s41477-017-0067-8 
https://doi.org/10.1101/ 
gr.213405.116 
https://doi.org/10.1101/ 
gr.217117.116 
http://opendata. 
earlham.ac.uk/ 
Triticum_turgidum/ 
https://doi.org/10.1126/ 
science.aan0032 
https://doi.org/10.1126/ 
science.aar7191 
https://doi.org/10.1038/ 
s41586-018-0108-0 
https://doi.org/10.1038/ 
s41588-019-0381-3 
https://doi.org/10.1534/ 
g3.118.200902 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
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Table 10.2 (continued) 


Species 
T. aestivum 


T. aestivum 

T. aestivum 

T. aestivum 

T. aestivum 

T. aestivum 

T. aestivum 

T. aestivum 

T. aestivum 

T. aestivum ssp. 
tibetanum Shao 
Ae. tauschii 
Ae. tauschii 
(AY17) 

Ae. tauschii 
(AY61) 

T. aestivum 

T. aestivum 

T. aestivum 

Ae. tauschii 
(T093) 

Ae. tauschii 
(XJ02) 

Ae. longissima 
Ae. speltoides 
Ae. sharonensis 


T. aestivum 


T. aestivum 


T. aestivum 


Ae. bicornis 


Genotype 
JAGGER 


JULIUS 
LONGREACH- 
LANCER 
MACE 
NORIN 61 
PARAGON 
ROBIGUS 
SY MATTIS 
WEEBILL 1 
ZANG1817 
AL8/78 


AY17 


AY61 


CHINESE SPRING 


(RefSeq v2.1) 
FIELDER 


RENAN 
T093 

XJ02 
AEG-6782-2 
AEG-9674-1 
AS_1644 
KARIEGA 


SONMEZ 


ATTRAKTION 


TBOI 


Year 
2020 


2020 


2020 


2020 


2020 


2020 


2020 


2020 


2020 


2020 


2021 


2021 


2021 


2021 


2021 


2021 


2021 


2021 


2022 


2022 


2022 


2022 


2022 


2022 


2022 


Genomes Type 


ABD Pseudomolecule 
ABD Pseudomolecule 
ABD Pseudomolecule 
ABD Pseudomolecule 
ABD Pseudomolecule 
ABD Scaffold 

ABD Scaffold 

ABD Pseudomolecule 
ABD Scaffold 

ABD Pseudomolecule 
D Pseudomolecule 
D Pseudomolecule 
D Pseudomolecule 
ABD Pseudomolecule 
ABD Pseudomolecule 
ABD Pseudomolecule 
D Pseudomolecule 
D Pseudomolecule 
S! Pseudomolecule 
S Pseudomolecule 
ga Pseudomolecule 
ABD Pseudomolecule 
ABD Pseudomolecule 
ABD Pseudomolecule 
sb Pseudomolecule 
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Reference 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Walkowiak 
et al. (2020) 
Guo et al. 
(2020) 
Wang et al. 
(2021) 
Zhou et al. 
(2021) 
Zhou et al. 
(2021) 

Zhu et al. 
(2021) 

Sato et al. 
(2021) 
Aury et al. 
(2022) 
Zhou et al. 
(2021) 
Zhou et al. 
(2021) 

Avni et al. 
(2022) 

Avni et al. 
(2022) 

Yu et al. 
(2022) 
Athiyannan 
et al. 2022) 
Akpinar 

et al. (2022) 


Kale et al. 
(2022) 

Li et al. 
(2022) 


Doi or link 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41586-020-2961-x 
https://doi.org/10.1038/ 
s41467-020-18738-5 
https://doi.org/10.1093/ 
g3journal/jkab325 
https://doi.org/10.1038/ 
s41477-021-00934-w 
https://doi.org/10.1038/ 
s41477-021-00934-w 
https://doi.org/10.1111/ 
tpj.15289 
https://doi.org/10.1093/ 
dnares/dsab008 
https://doi.org/10.1093/ 
gigascience/giac034 
https://doi.org/10.1038/ 
s41477-021-00934-w 
https://doi.org/10.1038/ 
s41477-021-00934-w 
https://doi.org/10.1111/ 
tpj.15664 
https://doi.org/10.1111/ 
tpj.15664 
https://doi.org/10.1038/ 
s41467-022-29132-8 
https://doi.org/10.1038/ 
s41588-022-01022-1 
https://doi. 
org/10.21203/ 
rs.3.rs-1095548/v1 
https://doi.org/10.1111/ 
pbi.13843 

https://doi. 
org/10.1016/j. 
molp.2021.12.019 
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Table 10.2 (continued) 


Species Genotype Year Genomes 
Ae. searsii TEOI 20222 | S 

Ae. sharonensis TH02 20228 SSH 

Ae. longissima TLOS 2022 S! 

Ae. speltoides TSOL 2022 S 


assembled resulting in a 10.2 Gb draft assem- 
bly referred to as the Chinese Spring Survey 
Sequences (CSS) and represented 61% of the 
genome sequence (IWGSC et al. 2014). 

A pseudomolecule level assembly of chro- 
mosome 3B was produced separately using a 
minimum tiling path of 8,452 BACs sequenced 
with Roche/454 paired-end reads (Choulet et al. 
2014). After scaffold assembly, [lumina reads 
from flow sorted chromosome 3B were used 
to fill gaps. A detailed SNP-based genetic map 
from the CHINESE SPRING x RENAN popu- 
lation was used to orient and order scaffolds. 
Ultimately, the pseudomolecule level assembly 
represented 93% of chromosome 3B. A total of 
124,201 high-confidence gene loci were anno- 
tated in the CSS and chromosome 3B assembly 
(IWGSC et al. 2014). 

Whole genome shotgun (WGS) assemblies 
of the Triticum turgidum ssp. durum cultivars 
CAPPELLI and STRONGFIELD were released 
in 2014 alongside an assembly of Ae. spel- 
toides accession ERX391140 (SS) (IWGSC 
et al. 2014). Although these assemblies con- 
sisted of numerous small contigs with unknown 
order, orientation, and space between contigs, 
partly due to the piling of repetitive elements, 
they offer a draft assembly of low-copy DNA 
and therefore can be used to identify alleles, 
design gene-specific markers, or compare genes 
and gene families among assemblies. Chapman 
et al. (2015) integrated WGS and genetic map- 
ping to assemble and order contigs of the syn- 
thetic hexaploid W7984. Despite the WGS 
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Type Reference Doi or link 
Pseudomolecule Li et al. https://doi. 
(2022) org/10.1016/j. 
molp.2021.12.019 
Pseudomolecule Li et al. https://doi. 
(2022) org/10.1016/j. 
molp.2021.12.019 
Pseudomolecule Li et al. https://doi. 
(2022) org/10.1016/j. 
molp.2021.12.019 
Pseudomolecule Li et al. https://doi. 
(2022) org/10.1016/j. 


molp.2021.12.019 


method and lack of chromosome isolation via 
flow sorting, the assembly was 9.1 Gb, just 
1.1 Gb smaller than the CSS assembly. 

With the growth of sequencing and assembly 
methods, more wheat scaffold and pseudomol- 
ecule level assemblies became available (Figs. 
10.1 and 10.2). As of August 2022, 46 unique 
accessions have scaffold and/or pseudomol- 
ecule level assemblies (Table 10.2). In 2020, 
there was a significant increase in the number 
of hexaploid accessions with pseudomolecule 
or scaffold level assemblies. Through a large 
international collaborative effort, Walkowiak 
et al. (2020) published the 10-- Wheat Genomes’ 
paper, which included pseudomolecule assem- 
blies of nine bread wheat lines and one T. aes- 
tivum ssp. spelta accession plus the scaffold 
level assemblies of five additional bread wheat 
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Fig. 10.1 Cumulative accessions with pseudomolecule 
level assemblies. Color corresponds to the subgenome of 
the accession 
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Fig. 10.2 Cumulative accessions with scaffold level 
assemblies. Color corresponds to the subgenome of the 
accession 


lines. Prior to this, CHINESE SPRING and the 
synthetic hexaploid W7984 were the only hexa- 
ploids with either a pseudomolecule or scaffold 
level assembly. Principal component analy- 
sis of exome sequence capture alleles in -1200 
hexaploid accessions revealed that CHINESE 
SPRING was genetically distant from other 
hexaploid wheats (Walkowiak et al. 2020). The 
accessions included in the Walkowiak et al. 
(2020) paper were selected to more accurately 
represent the full diversity of hexaploid wheat 
allowing analysis of intergenome variability. 
The genome of the Tibetan semi-wild wheat 
(T. aestivum ssp. tibetanum Shao) accession 
ZANGIS817 was also published the same year 
(Guo et al. 2020). 

Most of the Triticum and Aegilops assemblies 
and genome browsers are hosted on websites. 
Not all assemblies are hosted on a single web- 
site and different assembly and annotation ver- 
sions are available on different websites, so care 
should be taken when comparing assemblies 
or annotations from different sources. Many of 
these websites host additional resources that 
may be useful in the gene cloning and charac- 
terization process, such as molecular mark- 
ers, exome capture data, varietal SNPs, and 
TILLING mutants. 

The following are useful websites for access- 
ing the genome assemblies: 
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e GrainGenes (Yao et al. 2022)—https://wheat. 
pw.usda.gov. 

e Ensembl  Plants—http://plants.ensembl.org/ 
Triticum aestivum. 

e URGI—https://urgi.versailles.inrae.fr/blast/. 

e Grassroots Infrastructure—https://grassroots. 
tools/service/blast-blastnCerealsDB. 

e https://www.cerealsdb.uk.net/cerealgenom- 
ics/CerealsDB/blast WGS.php. 


10.3 Map-based Cloning 


Map-based cloning was used to clone the first 
wheat R gene, Lr10 (Feuillet et al. 2003). Since 
then, map-based cloning has been the most fre- 
quently used method to clone R/S genes in 
wheat (around 50%, Table 10.1). Map-based 
cloning uses the genetic relationship between a 
gene and molecular markers to place a gene on 
a genetic map. Originally, an iterative approach 
termed chromosome walking was used to 
define the candidate gene region. The two clos- 
est molecular markers were then used to screen 
large-insert libraries of cloned fragments of 
DNA (yeast artificial chromosomes or bacte- 
rial artificial chromosomes, YACs or BACs) to 
identify flanking clones, and new markers devel- 
oped from the ends of the clones were used to 
rescreen the library and “walk” closer to the 
gene of interest until a clone containing the gene 
was identified. Sequencing of the clone(s) span- 
ning markers defined by flanking genetic recom- 
binants would reveal the nucleotide sequence of 
the R/S gene. While we still use the term “clon- 
ing,” the development and screening of large- 
insert genomic clones are seldom still necessary 
to clone a gene. The development of molecular 
markers and subsequent high-density, or satura- 
tion, mapping of target R/S genes in segregating 
populations is a critical step in the map-based 
cloning process. Historically, high-density 
mapping was conducted on a low-throughput 
basis using markers such as restriction frag- 
ment length polymorphisms (RFLPs), amplified 
fragment length polymorphisms (AFLPs), or 
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simple-sequence repeats (SSRs, or microsatel- 
lites). Recent advances in high-throughput gen- 
otyping technologies such as Diversity Arrays 
Technology (DArT), DNA SNP arrays, custom 
Kompetitive allele-specific PCR (KASP) arrays, 
or genotyping by sequencing offer high-density 
genotyping at affordable costs. These genotyp- 
ing technologies can also be used in combina- 
tion with a bulked segregant analysis (BSA) 
approach to quickly find markers associated 
with a phenotype without having to genotype a 
large mapping population (see also Chap. 9). 

The size of the candidate gene region, as 
defined by the genetic region between the clos- 
est markers flanking the R/S gene, is dependent 
on both the marker density and the recombina- 
tion rate. In a population of fixed size, such as 
a recombinant inbred or doubled haploid popu- 
lation, there is a finite number of recombina- 
tion events. Sometimes, there are not enough 
recombination events in a population to reduce 
the candidate gene region to a reasonable size. 
If the marker density is too low, recombination 
events can go undetected, resulting in a larger 
candidate gene region. Additional molecular 
markers in a region cosegregating with the gene 
will not increase resolution. Even in cases where 
marker density and recombination rate are high, 
a candidate gene region may be gene-rich, mak- 
ing it difficult to identify the trait-associated 
gene. Map-based cloning also requires access 
to the DNA sequence between the flanking 
markers. This need is often met by the multiple 
sequenced wheat genomes. It is important to 
remember that even if the sequenced wheat gen- 
otypes do not carry a functional allele of a target 
R/S gene, they may carry a nonfunctional allele. 
As such, it may be useful to identify candidate 
genes even in genotypes that do not display the 
desired resistant or susceptible phenotype. If the 
phenotypes of the sequenced wheat genomes 
are known, candidate genes may be eliminated 
based on a comparison of gene content between 
lines with and without the trait of interest 
(Running and Faris, unpublished). 

If the sequenced wheat genotypes do not 
carry an allele of the R/S gene, or when the R/S 
gene is in an area of low recombination, such as 
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an introgressed segment from a wild relative or 
near a centromere, alternate gene cloning meth- 
ods may be more appropriate. Map-based clon- 
ing can be slow, dependent on the generation of 
the mapping population, and requires screening 
of 1000’s of recombinant gametes. 


10.4 Reduced-Representation 
Sequencing Methods 


Reduced-representation sequencing (RRS) is a 
key step in the rapid cloning methods that are 
used in wheat (described below), and it can be 
combined with traditional map-based cloning 
methods to quickly identify candidate genes. 
RRS reduces genome complexity and therefore 
the cost and time of sequencing and analysis. 
The three main methods of RRS are transcrip- 
tome or RNA sequencing, exome capture, and 
chromosome flow sorting (Fig. 10.3). These 
methods allow preferential sequencing of more 
relevant spaces, either genic regions or pro- 
moters, or the specific chromosome containing 
an R/S gene. In some cases, RRS methods are 
incorporated into rapid cloning methods. 


10.4.1 Exome Capture 


In exome capture, the baits, or capture probes, 
hybridize to the targets and then are bound 
by streptavidin-coated magnetic beads. The 
magnetic beads are “captured” by a magnet, 
unbound DNA is washed away, and the remain- 
ing target-enriched library is amplified and 
sequenced. Capture probes’ assays can target 
genes, promoters, and even specific types of 
genes like nucleotide-binding domain leucine- 
rich repeat containing genes (NLRs). Exome 
capture assays targeting the genic regions of 
wheat have been designed from the sequenced 
wheat genomes, each using an increasing 
design space size as additional wheat genome 
sequences became available. 

Jordan et al. (2015) designed an exome cap- 
ture probe assay called the “wheat exome cap- 
ture” (WEC) using a design space of 110 Mb 
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Transcriptome sequencing. RNA is isolated from tissue 
and reverse transcribed into cDNA, which is sequenced 
and mapped to a reference assembly. b Exome sequenc- 
ing. DNA is isolated from tissue and a DNA sequencing 
library is prepared. Short biotinylated baits complemen- 
tary to the targets hybridize to the DNA, bind to mag- 
netic beads, and are captured by a magnet, yielding a 


from a 3.8 Gb low-copy number genome assem- 
bly of CHINESE SPRING (Brenchley et al. 
2012). To identify genic regions, they aligned 
reported wheat cDNA and EST sequences 
and conducted a BLASTn search using 
Brachypodium exon sequences. Krasileva et al. 
(2017) designed T. turgidum and T. aestivum 
exome capture probes to target gene annotations 
from the CSS assembly, transcripts from tran- 
scriptome studies, and unannotated homologs 


can target the whole exome or a particular gene fam- 
ily such as NLRs as is done in the RenSeq method. c 
Chromosome flow sorting. Liquid suspensions of mitotic 
chromosomes collected from dividing root cells are 
fluorescently labeled and separated using flow cytom- 
etry based on the fluorochrome signal and relative DNA 
content 


of barley in wheat. The exome capture probes 
targeted 85 Mb. Following the publication of 
high-quality reference wheat genome assemblies 
and annotations in 2017 and 2018, Gardiner 
et al. (2019) discovered that the existing exome 
capture assay only targeted 32.696 of the high- 
confidence gene set of wheat. Using the high- 
confidence annotated genes in the CHINESE 
SPRING-TGACv1 and RefSeq.vl genome 
assemblies, Ae. tauschii assembly Aet v4.0, and 
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the T. turgidum ssp. dicoccoides WEWSeq v1.0 
assembly, they designed exome capture probe 
sets targeting genes and putative promoters. 
Probes of-75 bp were designed approximately 
every 120 bp across 786 Mb of design space, of 
which 509 Mb was gene space, and 277 Mb was 
putative promoter sequences. The exome capture 
and promoter capture probe sets designed by 
Jordan et al. (2015), Krasileva et al. (2017), and 
Gardiner etal. (2019) were available through 
NimbleGen (Roche) but have since been discon- 
tinued. The most recent exome capture assay, 
the myBaits? Expert Wheat Exome capture, 
designed using the CHINESE SPRING-RefSeq 
v1.0 assembly, captures over 250 Mb of coding 
sequence (Daicel Arbor Biosciences). 

To further reduce genome complexity, cap- 
ture probes assays can be developed to target a 
particular gene class such as NLRs. NLRs are 
the most common class of cloned R/S gene in 
wheat (Table 10.1), and the wheat pangenome 
is estimated to contain 6-8 thousand NLR 
genes (Walkowiak etal. 2020). Exome capture 
of NLR genes and subsequent sequencing is 
termed Resistance gene enrichment Sequencing 
(RenSeq). The first R genes cloned using 
RenSeq were Rpi-ber2 and Rpi-rzc1, which con- 
fer resistance against Phytophthora infestans 
infections in potato (Jupe etal. 2013). Since 
then, RenSeq has been incorporated into rapid 
cloning methods AgRenSeq and MutChromSeq 
(discussed below). RenSeq was also recently 
combined with BSA in a method termed BSR- 
Seq (Lin etal. 2022). RenSeq was applied to 
DNA pools of resistant and susceptible plants 
allowing the identification of SNPs in NLRs 
linked to resistance. RenSeq is a key method 
in multiple rapid cloning strategies, efficiently 
enriching NLR genes. Kale et al. (2022) found 
the Triticeae RenSeq Baits V3 probe set (Zhang 
etal. 2021a) resulted in target enrichment of 
220-fold of 18 Mb of NLR genes annotated in 
CHINESE SPRING-RefSeq v1.0. However, 
because probes were designed to target previ- 
ously annotated NLR genes, RenSeq captures 
are biased and may not capture unannotated 
NLRs, i.e., NLRs not present in the sequences 
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and annotated genome assemblies. RenSeq also 
relies on the assumption that the target R/S gene 
is a member of the NLR class. If it is suspected 
that the target gene might belong to a different 
class, then other methods should probably be 
considered. 


10.4.2 Transcriptome Sequencing 


Transcriptome sequencing, or RNA-Seq, is a 
less biased RRS method as it is not limited to 
previously annotated genes and/or a gene family. 
RNA-Seq combined with BSA (BSR-Seq) was 
applied to two Ae. tauschii populations to map 
leaf rust resistance gene Lr42, yielding just three 
candidate genes (Lin etal. 2022). RNA-Seq is 
limited to detecting genes that are expressed at 
the time of RNA collection in sufficient levels, 
and assembly of transcripts can be challenged 
by the co-expression of homoeologs. Lin et al. 
(2022) avoided the latter challenge by conduct- 
ing RNA-Seq on a diploid. 


10.4.3 Chromosome Flow Sorting 


Chromosome flow sorting separates an individ- 
ual chromosome via flow cytometry based on 
the chromosome size and base-pair composition 
(DoleZel et al. 2011). Following separation, the 
individual chromosome can be sequenced and 
assembled as was done to complete the CSS 
assembly (IWGSC etal. 2014). Chromosome 
flow sorting is a highly specialized skill requir- 
ing unique equipment available in few labs. 
Also, not all chromosomes are able to be sorted 
from all others at sufficient efficiency to obtain 
a sample with adequate purity, and the time 
and labor needed to develop cytogenetic stocks 
such as the ditelosomics developed by Sears and 
Sears (1978) in CHINESE SPRING preclude 
that from being a viable option. Therefore, it 
is important to first determine whether a target 
chromosome can be efficiently sorted using flow 
cytometry before embarking on a project that 
relies on it to be successful. 
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10.5.1 MutRenSeq 


RenSeq is coupled with mutational genom- 
ics in the MutRenSeq rapid cloning strategy 
(Steuernagel etal. 2016). In the MutRenSeq 
method, a mutant population is screened to iden- 
tify the expected mutant phenotype and then 
RenSeq is conducted on confirmed mutants 
(Fig. 10.4). Independent mutation events within 
a single NLR associated with the mutant phe- 
notype reveal the candidate gene(s). $722 and 
Sr45 were the first wheat R/S genes cloned in 
wheat using MutRenSeq. $722, which provides 
stem rust resistance, resides in introgressions 
from T. boeoticum and T. monococcum that had 
poor agronomic performance due to linkage 
drag (Olson et al. 2010). Additionally, mapping 
efforts were hampered by reduced recombina- 
tion in the $722 region (Steuernagel et al. 2016). 
To clone stem rust resistance genes $722 and 
Sr45, Steuernagel et al. (2016) developed EMS- 
mutant populations for each R gene and applied 
RenSeq to six mutants/population and the wild 
type. In each mutant population, comparative 
sequence analysis of the NLRs in the mutants 
and wild type revealed one gene with muta- 
tions in all six mutants. MutRenSeq effectively 
eliminated the need for high-resolution map- 
ping, which is particularly difficult when the 
R/S gene of interest resides in a low recombina- 
tion region. MutRenSeq has since been used to 
clone stem rust resistance genes $726, Sr27, and 
Sr61, stripe rust resistance genes Yr5 and Yr7, 
leaf rust resistance gene Lr/3/Ne2, and powdery 
mildew resistance gene Pm2/ (Xing et al. 2018; 
Marchal et al. 2018; Zhang et al. 2021a; Hewitt 
et al. 2021a, b; Yan et al. 2021; Upadhyaya et al. 
2021). 

MutRenSeq is a powerful tool to quickly 
clone NLR resistance genes and is particularly 
advantageous when trying to clone a gene in an 
area of low recombination. However, it is lim- 
ited to genotypes that can be easily mutagen- 
ized and R genes in the NLR family. In general, 
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higher ploidy levels tend to tolerate higher 
EMS levels. The lower tolerance of mutagen 
dose results in lower mutation density, increas- 
ing the number of mutants that must be gener- 
ated and phenotypically evaluated to identify 
independent lines with mutant alleles. In some 
cases, mutagenesis of diploids can result in ster- 
ile plants making the MutRenSeq method a less 
attractive option. 


10.5.2 AgRenSeq 


To address the limitations of MutRenSeq, 
Association Genetics RenSeq (AgRenSeq) 
was developed (Arora et al. 2019) by combin- 
ing association genetics and RenSeq. A diver- 
sity panel is phenotyped for disease reactions 
and RenSeq is conducted on the panel. K-mers 
within the sequenced NLR are identified and 
mapped to a reference assembly. Associations 
between k-mers and phenotypes are then cal- 
culated and plotted, similar to a Manhattan 
plot. Significant k-mers map to contigs that 
represent candidate genes. To test AgRenSeq, 
a panel of 174 Ae. tauschii ssp. strangulata 
accessions was genotyped and evaluated for 
reaction to six races of wheat stem rust patho- 
gen Puccinia graminis f. sp. tritici (PGT). Two 
previously cloned genes, Sr33 and Sr45, served 
as positive controls (Periyannan etal. 2013; 
Steuernagel etal. 2016). K-mers associated 
with resistance to PGT race RKQQC, which 
is avirulent to S733, resided on the contig con- 
taining the previously cloned $733. Sr45, which 
was previously identified using MutRenSeq 
(Steuernagel etal. 2016), was also identified 
via AgRenSeq. Candidate genes for Sr46 and 
SrTA1662 were also identified in this study, 
and the Sr46 candidate was functionally vali- 
dated by mutagenesis and gene complementa- 
tion. Thus, Arora et al. (2019) demonstrated the 
ability of AgRenSeq to directly identify can- 
didate genes. However, as with other RenSeq- 
based cloning methods, AgRenSeq is limited to 
cloning NLR genes. 
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c. sequencing 


d. R/S gene identification 


comparative 


sequence analysis 


. 


R/S gene 


Fig. 10.4 Overview of R/S gene rapid cloning meth- 
ods in wheat. The paths of MutChromSeq (orange), 
MutRenSeq (red), AgRenSeq (teal), and k-mer GWAS 
(seafoam) are shown with stops at particular methods 
numbered and connected with solid lines. a Source of 
phenotypic variation. Rapid cloning methods use one of 
two forms of phenotypic variation, induced phenotypic 
variation through mutagenesis (left) or natural variation 
in a diversity panel (right). b Genome complexity reduc- 
tion. After phenotyping, the next step is genome com- 
plexity reduction through either chromosome flow sort- 
ing or R gene enrichment through gene family capture. 
Note, the k-mer GWAS path moves directly to sequenc- 
ing. c Sequencing. Next, the flow sorted chromosome, 
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R/S gene candidate 


captured genes, or diversity panel is/are sequenced. 
Depending on the target, personal preferences, and 
resources available, different sequencing methods may 
be used. d R/S gene identification. The final step involves 
identifying candidate genes. Left, candidate genes are 
identified through comparison of mutant (light teal) and 
wild-type sequences to identify regions with mutation 
overlap. Right, associations between particular NLRs or 
k-mers are identified with the highest associations being 
candidate genes or near candidate genes. Association 
genetics yields candidate genes that require func- 
tional validation while methods using induced variation 
through mutagenesis already include a functional valida- 
tion step 
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10.5.3 K-mer GWAS 


K-mer-based association mapping, or k-mer 
GWAS, is an extension of AgRenSeq, but it 
excludes the RenSeq step and is therefore not 
limited to the detection of only NLRs. Instead, 
k-mers are identified from whole-genome shot- 
gun sequencing reads and projected onto a ref- 
erence assembly. The analysis is similar to 
AgRenSeq, but because k-mers can be anywhere 
and not just within candidate genes, one must 
analyze the genes near the k-mers that were in 
linkage disequilibrium with the phenotype. 
Gaurav etal. (2022) conducted whole-genome 
shotgun sequencing on 242 Ae. tauschii acces- 
sions and used k-mer GWAS to identify a 50-kb 
linkage disequilibrium block containing two 
candidate genes for the stem rust resistance gene 
SrTA1662. Subsequent functional validation via 
gene complementation confirmed that SrTA 1662 
is an NLR. The panel sequenced in Gaurav et al. 
(2022) is publicly available and can be used 
to rapidly clone R/S genes from Ae. tauschii 
accessions. 

In 2020, Voichek and Wiegel published a 
reference-free k-mer GWAS method. In this 
method, the associations between k-mers and the 
phenotype were calculated prior to mapping the 
k-mers to a reference, allowing the identifica- 
tion of k-mers significantly associated with the 
trait, including those absent in a reference. In 
a case study in Arabidopsis, the authors identi- 
fied k-mers significantly associated with two 
traits—growth in the presence of a flg22 variant 
and germination in darkness under low nutrient 
supply—neither of which mapped to their refer- 
ence genome. Assembly and subsequent analysis 
of the short reads used to identify the signifi- 
cant k-mers revealed alternate structural variants 
of genes associated with the two traits. Although 
reference-free k-mer GWAS has not yet been 
used to clone R/S genes in wheat, it has been 
applied to map resistance to yellow rust and 
leaf rust (Kale et al. 2022). R/S genes display 
abundant presence/absence and copy number 
variation (Van de Weyer et al. 2019; Walkowiak 
et al. 2020), so the potential to detect structural 
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variants not in a reference assembly via refer- 
ence-free k-mer GWAS is appealing. 

Both AgRenSeq and k-mer-GWAS require 
shot gun sequencing of an entire diversity panel, 
which can initially be expensive and labori- 
ous. However, once this has been completed, 
the same panel can be used to clone multiple 
R/S genes. Additionally, AgRenSeq and k-mer 
GWAS can be limited by the population struc- 
ture of the diversity panel (Yu et al. 2006) and 
choice of the reference sequence can influence 
which associations are detected (Voichek and 
Weigel 2020; Kale et al. 2022). 


10.5.4 MutChromSeq 


In 2016, Sánchez-Martín etal. published the 
rapid cloning method MutChromSeq and used 
it to clone the powdery mildew resistance gene 
Pm2a, which had previously mapped to chro- 
mosome 6A (Huang and Róder 2004). Using 
the MutChromSeq method, which applied the 
RRS method chromosome flow sorting, chromo- 
some 6A was sorted from six confirmed EMS- 
derived powdery mildew susceptible mutants 
and wild-type genotypes. The separated chromo- 
somes were sequenced and assembled followed 
by sequence analysis to identify mutation over- 
lap. Contigs with mutations in all or most of the 
mutant lines are most likely to contain the candi- 
date gene. Two contigs were identified, although 
one was later discarded due to an abnormal SNV 
frequency, leaving just one contig with a NLR 
gene. MutChromSeq is similar to MutRenSeq, 
but it is not limited to NLR genes. MutChromSeq 
was also used to clone leaf rust resistance gene 
Lrlda with ankyrin transmembrane protein 
domains and Pm4b, which contains kinase, C2, 
and transmembrane domains (Kolodziej et al. 
2021; Sánchez-Martín et al. 2021). 


10.6 Validating Candidate Genes 


Validating candidate genes is a critical step 
in proving a gene confers a particular pheno- 
type. Forward and reverse genetics approaches 
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forward genetics 


- random mutagenesis 


| - positional cloning 


- association genetics 


reverse genetics 


phenotype 


- TILLING populations 


genotype 


- gene complementation 
- gene editing (CRISPR-Cas9) 


- gene silencing(RNAi/virus-induced) 
- transient expression 


Fig.10.5 Commonly used forward and reverse genetics 
methods to identify and/or validate R/S genes in wheat. 
Arrows indicate the direction of the genetic approaches 
with forward genetics approaches starting with a known 
phenotype and identifying the gene underlying the 


can be used to identify and validate candidate 
genes. Forward genetics approaches start from 
a phenotype and identify the gene that confers 
the phenotype (Fig. 10.5). Many of the rapid 
cloning methods are considered forward genet- 
ics approaches as they start with variation in a 
phenotype, either natural or induced through 
mutagenesis. However, not all forward genetics 
approaches serve as functional validation meth- 
ods. For example, map-based cloning and asso- 
ciation genetics approaches often yield multiple 
candidate genes and must be followed up with 
functional validation to determine which candi- 
date gene is the gene of interest. Because rapid 
cloning methods MutRenSeq and MutChromSeq 
use mutagenized populations, these methods 
both identify and validate candidate genes. 
Reverse genetics approaches start with the 
gene sequence and identify the phenotypic 
effects of particular gene states (Fig. 10.5). 
Functional validation methods that use a 
reverse genetics approach include methods 
like RNA interference, gene complementa- 
tion, or CRISPR/CAS9 gene editing, which 
alter the genetic or transcriptomic makeup of 
an individual to identify the phenotypic effect 
of the alteration. The Targeting Induced Local 
Lesions in Genomes (TILLING) resource can 
also be used to functionally validate genes in a 


phenotype, while reverse genetics starts with a known 
gene sequence and identifies the phenotypic effects of 
genic or transcriptomic alternations. Common methods 
used to identify and/or validate R/S genes in wheat are 
listed under their approach type 


reverse genetics manner. Krasileva et al. (2017) 
sequenced the exomes of 1200 CADENZA 
and 1535 KRONOS EMS mutants and char- 
acterized and cataloged the mutations rela- 
tive to the CSS assembly. When the CHINESE 
SPRING-RefSeqv1.0 assembly was published, 
the TILLING raw reads were realigned to the 
new assembly. The TILLING resources expedite 
functional validation of genes as researchers do 
not need to create the genetic or transcriptomic 
alteration. Instead, mutant lines with known 
alterations in candidate genes can be selected 
on Ensembl Plants and ordered from SeedStor 
(https://www.seedstor.ac.uk/). However, the 
TILLING resource is limited to function- 
ally validating genes present in CADENZA 
or KRONOS and annotated in the CHINESE 
SPRING-RefSeqv1.1 gene models. 

Often both forward and reverse genetics 
approaches are applied to functionally validate 
R/S genes. The two most commonly used func- 
tional validation methods are mutagenesis and 
gene complementation, both of which have been 
used to validate around two-thirds of the cloned 
R/S genes. About 43% of the cloned R/S genes 
have been validated using both mutagenesis and 
gene complementation. Gene silencing, tran- 
sient expression, gene editing, and the TILLING 
populations are less frequently used methods of 
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functional validation, with each being used to 
validate 15 or fewer R/S genes. 

Clustered regularly interspaced short palin- 
dromic repeats (CRISPR) and its associated pro- 
tein (Cas) can be used to produce site-specific 
double-stranded breaks, often resulting in gene 
knockout. Wang etal. (2014) used CRISPR- 
Cas9 and transcription activator-like effector 
nuclease (TALEN) technologies to knock out 
three homoeoalleles of the powdery mildew sus- 
ceptibility gene Mlo in the cultivar Bobwhite, 
resulting in reduced susceptibility to Blumeria 
graminis f. sp. tritici. While CRISPR-Cas medi- 
ated gene knockout is a highly specific and 
targeted functional validation method, unlike 
random mutagenesis, it is somewhat limited to 
functionally validating genes present in easily 
transformable cultivars such as FIELDER or 
BOBWHITE. However, advancements in gene 
editing and transformation methods are expand- 
ing the definition of “transformable cultivars.” 


10.7 Conclusions and Future 
Outlook 


The expansion of wheat genomic resources, 
genomic complexity reduction methods coupled 
with advanced sequencing technologies, and 
rapid cloning methods has enabled the acceler- 
ated cloning of R/S genes in wheat. In 2020 and 
2021, sixteen cloned R/S genes were published, 
a feat that in the earlier years of R/S gene clon- 
ing took thirteen years to accomplish; it was 
not unheard of for cloning an R/S gene to take 
10 years. Now, cloning an R/S gene is possible 
in less than a year. Undoubtedly, R/S gene clon- 
ing will continue to accelerate as more refer- 
ence genomes are published, sequencing costs 
decrease, and cloning methods advance. The 
multiple sequenced wheat genomes that are cur- 
rently available are a tremendous resource and 
make it relatively easy to assess gene content in a 
given R/S gene candidate gene region. However, 
given the common presence/absence and copy 
number variation displayed by R/S genes (Van 
de Weyer et al. 2019; Walkowiak et al. 2020), it 
is still possible for a gene of interest to be absent 
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in all the wheat genomes currently available. 
We have not yet reached a true wheat pange- 
nome, but costs for sequencing and assembly of 
entire wheat genomes continue to decline, and 
the data can be obtained in a matter of months. 
Therefore, it is now becoming more feasible to 
sequence and assemble the entire genome of a 
wheat line with the primary goal of cloning a sin- 
gle R/S gene, a feat that was nearly unthinkable 
when wheat genomics researchers met in 2003 to 
carve a path forward to sequence the first wheat 
genome (Gill et al. 2004). 

Wheat's wild relatives offer a greater pool of 
R genes, as they have not undergone the genetic 
bottleneck characteristic of domestication. 
Association genetics methods, like k-mer GWAS 
and AgRenSeq, address some of the limitations 
of traditional map-based cloning, exploiting 
greater genetic diversity and ancestral recombi- 
nation to identify unique disease resistance loci. 
Additionally, these diversity panels often allow 
the isolation of more than one R gene as they 
segregate for resistance to multiple isolates and/ 
or races of multiple pathogens, whereas bipa- 
rental mapping populations are often designed 
to segregate for only one R/S locus for ease of 
genetic mapping. The advances in sequencing 
technologies, cloning methods, and gene edit- 
ing technologies will likely soon reshape the 
way R genes from wild relatives are deployed in 
adapted germplasm. Historically, chromosome 
engineering strategies involving cytogenetic 
methods to achieve chromosome substitutions, 
translocations, and ultimately introgressions 
of smaller segments containing target genes, 
were extremely laborious and time-consuming, 
and the end product usually suffered from del- 
eterious linkage drag. The modern sequenc- 
ing and cloning technologies discussed in this 
chapter may make it more feasible to clone the 
target gene in the wild relative accession itself. 
Although genetic transformation (GMO wheat) 
is currently not accepted, the acceptance of gene 
editing appears more promising. Thus, once 
a target R gene is cloned from a wild relative, 
it is conceivable that a homologous gene could 
be identified in wheat and edited to acquire the 
desired function. 
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With the availability of multiple reference 
wheat genomes, in some cases, the bottleneck of 
cloning R/S genes has shifted from candidate gene 
identification to functional validation. The use of 
the CADENZA- and KRONOS-TILLING popula- 
tions offers rapid functional validation. However, a 
single bread wheat and durum wheat cultivar can- 
not feasibly represent the R/S gene content of all 
bread and durum wheat. Still, due to ease of use 
and affordability, the TILLING populations are an 
excellent resource worth considering. 

Cloning and deploying R genes and remov- 
ing S genes is a constant highly coordinated race 
to keep up with evolving pathogen populations. 
We suspect that as more R/S genes are cloned, 
more research will focus on identifying unique 
durable combinations of R/S genes. For exam- 
ple, Luo etal. (2021) transformed a five-gene 
cassette of stem rust resistance genes into bread 
wheat cultivar FIELDER, resulting in broad- 
spectrum resistance. Another benefit of cloning 
multiple R/S genes in a given system is that the 
cumulative knowledge acquired can begin to 
shed light on the essential components, which 
can lead to the development of designer genes 
that could operate to govern broad-spectrum 
resistance and perhaps resistance less prone 
to being overcome due to natural mutations 
occurring in the pathogen. The use of R gene 
cassettes, disruption of S genes, or the develop- 
ment and deployment of designer genes made 
possible through advancements in tissue cul- 
ture, transformation methods, and gene editing 
technologies are promising directions to ensure 
stable wheat production enhancing global food 
security. 
with 


Acknowledgements created 


Biorender.com. 


Figures were 


References 


Acevedo-Garcia J, Spencer D, Thieron H, Reinstüdler 
A, Hammond-Kosack K, Phillips AL, Panstruga R 
(2017) mlo-based powdery mildew resistance in 
hexaploid bread wheat generated by a non-transgenic 
TILLING approach. Plant Biotechnol J 15:367—378 


207 


Akpinar BA, Leroy P, Watson-Haigh NS etal (2022) 
The complete genome sequence of elite bread wheat 
cultivar, “Sonmez”. F1000Res 11:614. https://doi. 
org/10.12688/f1000research.121637.1 

Arora S, Steuernagel B, Gaurav K etal (2019) 
Resistance gene cloning from a wild crop relative 
by sequence capture and association genetics. Nat 
Biotechnol  37:139-143.  https://doi.org/10.1038/ 
s41587-018-0007-9 

Athiyannan N, Abrouk M, Boshoff WHP et al (2022) 
Long-read genome sequencing of bread wheat 
facilitates disease resistance gene cloning. Nat 
Genet 54:227-231. https://doi.org/10.1038/ 
s41588-022-01022-1 

Aury J-M, Engelen S, Istace B et al (2022) Long-read 
and chromosome-scale assembly of the hexaploid 
wheat genome achieves high resolution for research 
and breeding. GigaScience 11:giac034. https://doi. 
org/10.1093/gigascience/giac034 

Avni R, Nave M, Barad O etal (2017) Wild emmer 
genome architecture and diversity elucidate wheat 
evolution and domestication. Science 357:93-97. 
https://doi.org/10.1126/science.aan0032 

Avni R, Lux T, Minz-Dub A etal (2022) Genome 
sequences of three Aegilops species of the section 
Sitopsis reveal phylogenetic relationships and provide 
resources for wheat improvement. Plant J 110:179- 
192. https://doi.org/10.1111/tpj.15664 

Bennett MD, Smith J (1976) Nuclear DNA amounts in 
angiosperms. Phil Trans R Soc B 274:227-274 

Brenchley R, Spannagl M, Pfeifer M etal (2012) 
Analysis of the bread wheat genome using whole- 
genome shotgun sequencing. Nature 491:705-710. 
https://doi.org/10.1038/nature11650 

Cao A, Xing L, Wang X etal (2011) Serine/threo- 
nine kinase gene Stpk-V, a key member of powdery 
mildew resistance gene Pm2/, confers powdery 
mildew resistance in wheat. Proc Natl Acad Sci 
USA 108:7727-7732. https://doi.org/10.1073/ 
pnas.1016981108 

Chapman JA, Mascher M, Bulug A etal (2015) A 
whole-genome shotgun approach for assembling 
and anchoring the hexaploid bread wheat genome. 
Genome Biol 16:26.  https://doi.org/10.1186/ 
$13059-015-0582-8 

Chen S, Zhang W, Bolus S et al (2018) Identification and 
characterization of wheat stem rust resistance gene 
Sr21 effective against the Ug99 race group at high 
temperature. PLoS Genet 14:e1007287. https://doi. 
org/10.1371/journal.pgen.1007287 

Chen S, Rouse MN, Zhang W et al (2020) Wheat gene 
Sr60 encodes a protein with two putative kinase 
domains that confers resistance to stem rust. New 
Phytol 225:948-959. https://doi.org/10.1111/ 
nph.16169 

Choulet F, Alberti A, Theil S et al (2014) Structural and 
functional partitioning of bread wheat chromosome 
3B. Science 345:1249721. https://doi.org/10.1126/ 
science.1249721 


208 


Clavijo BJ, Venturini L, Schudoma C etal (2017) An 
improved assembly and annotation of the allohexa- 
ploid wheat genome identifies complete families of 
agronomic genes and provides genomic evidence for 
chromosomal translocations. Genome Res 27:885- 
896. https://doi.org/10.1101/gr.217117.116 

Cloutier S, McCallum BD, Loutre C et al (2007) Leaf 
rust resistance gene Lr/, isolated from bread wheat 
(Triticum aestivum L.) is a member of the large 
psr567 gene family. Plant Mol Biol 65:93-106. 
https://doi.org/10.1007/s11103-007-9201-8 

Dolezel J, Kubaláková M, íhalíková J etal (2011) 
Chromosome analysis and sorting using flow cytom- 
etry. In: Birchler JA (ed) Plant chromosome engineer- 
ing. Humana Press, Totowa, NJ, pp 221-238 

Elliott C, Zhou F, Spielmeyer W, Panstruga R, Schulze- 
Lefert P (2002) Functional conservation of wheat 
and rice Mlo orthologs in defense modulation to the 
powdery mildew fungus. Mol Plant Microbe Interact 
15:1069-1077 

Endo TR, Gill BS (1996) The deletion stocks of common 
wheat. J Hered 87:295—307 

Erenstein O, Jaleta M, Mottaleb KA etal (2022) 
Global trends in wheat production, consump- 
tion and trade. In: Reynolds MP, BraunHJ (eds) 
Wheat Improvement. Springer, Cham. https://doi. 
org/10.1007/978-3-030-90673-3 4 

Faris JD (2014) Wheat domestication: key to agricultural 
revolutions past and future. In: Tuberosa R, Graner A, 
Frison E (eds) Genomics of plant genetic resources. 
Springer, Netherlands, Dordrecht, pp 439—464 

Faris JD, Zhang Z, Lu H etal (2010) A unique wheat 
disease resistance-like gene governs effector-trig- 
gered susceptibility to necrotrophic pathogens. Proc 
Natl Acad Sci USA 107:13544—13549. https://doi. 
org/10.1073/pnas.1004090107 

Feuillet C, Travella S, Stein N et al (2003) Map-based 
isolation of the leaf rust disease resistance gene Lr/0 
from the hexaploid wheat (Triticum aestivum L.) 
genome. Proc Natl Acad Sci USA 100:15253-15258. 
https://doi.org/10.1073/pnas.2435133100 

Fu D, Uauy C, Distelfeld A et al (2009) A kinase-START 
gene confers temperature-dependent resistance to 
wheat stripe rust. Science 323:1357-1360. https://doi. 
org/10.1126/science.1166289 

Gardiner L-J, Brabbs T, Akhunov A etal (2019) 
Integrating genomic resources to present full gene 
and putative promoter capture probe sets for bread 
wheat. GigaScience 8:1—13. https://doi.org/10.1093/ 
gigascience/giz018 

Gaurav K, Arora S, Silva P etal (2022) Population 
genomic analysis of Aegilops tauschii identi- 
fies targets for bread wheat improvement. Nat 
Biotechnol 40:422-431.  https://doi.org/10.1038/ 
s41587-021-01058-4 

Gill BS, Appels R, Botha-Oberholster A-M et al (2004) 
A workshop report on wheat genome sequencing. 
Genetics 168:1087-1096.  https://doi.org/10.1534/ 
genetics. 104.034769 


K. L. D. Running and J. D. Faris 


Guo W, Xin M, Wang Z et al (2020) Origin and adap- 
tation to high altitude of Tibetan semi-wild wheat. 
Nat Commun  11:5085. https://doi.org/10.1038/ 
s41467-020-18738-5 

Hafeez AN, Arora S, Ghosh S et al (2021) Creation and 
judicious application of a wheat resistance gene atlas. 
Mol Plant 14:1053-1070. https://doi.org/10.1016/j. 
molp.2021.05.014 

He H, Zhu S, Ji Y etal (2017) Map-based cloning 
of the gene Pm21 that confers broad spectrum 
resistance to wheat powdery mildew. https://doi. 
org/10.1101/177857 

Hewitt T, Miiller MC, Molnar I et al (2021a) A highly 
differentiated region of wheat chromosome 7AL 
encodes a Pmla immune receptor that recognizes 
its corresponding AvrPmla effector from Blumeria 
graminis. New Phytol 229:2812—2826. https://doi. 
org/10.1111/nph.17075 

Hewitt T, Zhang J, Huang L etal (2021b) Wheat leaf 
rust resistance gene Lr13 is a specific Ne2 allele for 
hybrid necrosis. Mol Plant 14:1025—1028. https://doi. 
org/10.1016/j.molp.2021.05.010 

Huai B, Yuan P, Ma X etal (2022) Sugar transporter 
TaSTP3 activation by TaWRKY19/61/82 enhances 
stripe rust susceptibility in wheat. New Phytol. 
https://doi.org/10.1111/nph.18331 

Huang X-Q, Róder MS (2004) Molecular mapping of 
powdery mildew resistance genes in wheat: a review. 
Euphytica .137:203-223.  https://doi.org/10.1023/ 
B:EUPH.0000041576.74566.d7 

Huang L, Brooks SA, Li W etal (2003) Map-based 
cloning of leaf rust resistance gene Lr2/] from 
the large and polyploid genome of bread wheat. 
Genetics 164:655-664.  https://doi.org/10.1093/ 
genetics/164.2.655 

Hurni S, Brunner S, Buchmann G et al (2013) Rye Pm8 
and wheat Pm3 are orthologous genes and show evo- 
lutionary conservation of resistance function against 
powdery mildew. Plant J 76:957—969. https://doi. 
org/10.1111/tpj.12345 

Ingvardsen CR, Massange-Sánchez JA, Borum F et al 
(2019) Development of mlo-based resistance in tetra- 
ploid wheat against wheat powdery mildew. Theor 
Appl Genet 132:3009-3022. https://doi.org/10.1007/ 
s00122-019-03402-4 

International Wheat Genome Sequencing Consortium 
(IWGSC) (2014) A chromosome-based draft 
sequence of the hexaploid bread wheat (Triticum 
aestivum) genome. Science 355:1251788. https://doi. 
org/10.1126/science.1251788 

Jia J, Zhao S, Kong X etal (2013) Aegilops tauschii 
draft genome sequence reveals a gene repertoire for 
wheat adaptation. Nature 496:91—95. https://doi. 
org/10.1038/nature12028 

Jordan KW, Wang S, Lun Y et al (2015) A haplotype map 
of allohexaploid wheat reveals distinct patterns of 
selection on homoeologous genomes. Genome Biol 
16:48. https://doi.org/10.1186/s13059-015-0606-4 


10 Rapid Cloning of Disease Resistance Genes in Wheat 


Jupe F, Witek K, Verweij W et al (2013) Resistance gene 
enrichment sequencing (RenSeq) enables reannota- 
tion of the NB-LRR gene family from sequenced 
plant genomes and rapid mapping of resistance loci 
in segregating populations. Plant J 76:530-544. 
https://doi.org/10.1111/tpj.12307 

Kale SM, Schulthess AW, Padmarasu S et al (2022) A 
catalogue of resistance gene homologs and a chro- 
mosome-scale reference sequence support resistance 
gene mapping in winter wheat. Plant Biotechnol J 
pbi.13843. https://doi.org/10.1111/pbi.13843 

Kan J, Cai Y, Cheng C et al (2022) Simultaneous editing 
of host factor gene TaPDIL5-1 homoeoalleles con- 
fers wheat yellow mosaic virus resistance in hexa- 
ploid wheat. New Phytol 234:340—344. https://doi. 
org/10.1111/nph.18002 

Klymiuk V, Yaniv E, Huang L et al (2018) Cloning of the 
wheat Yr/5 resistance gene sheds light on the plant 
tandem kinase-pseudokinase family. Nat Commun 
9:3735. https://doi.org/10.1038/s41467-018-06138-9 

Kolodziej MC, Singla J, Sánchez-Martín J et al (2021) 
A membrane-bound ankyrin repeat protein confers 
race-specific leaf rust disease resistance in wheat. 
Nat Commun 12:956.  https://doi.org/10.1038/ 
s41467-020-20777-x 

Krasileva KV, Vasquez-Gross HA, Howell T et al (2017) 
Uncovering hidden variation in polyploid wheat. Proc 
Natl Acad Sci USA 114. https://doi.org/10.1073/ 
pnas.1619268114 

Krattinger SG, Lagudah ES, Spielmeyer W etal 
(2009) A putative ABC transporter confers durable 
resistance to multiple fungal pathogens in wheat. 
Science 323:1360-1363.  https://doi.org/10.1126/ 
science. 1166453 

Li G, Zhou J, Jia H et al (2019) Mutation of a histi- 
dine-rich_ calcium-binding-protein gene in wheat 
confers resistance to Fusarium head blight. Nat 
Genet 51:1106-1112. https://doi.org/10.1038/ 
s41588-019-0426-7 

Li M, Dong L, Li B et al (2020) A CNL protein in wild 

emmer wheat confers powdery mildew resistance. 

New Phytol 228:1027-1037. https://doi.org/10.1111/ 

nph.16761 

L-F, Zhang Z-B, Wang Z-H etal (2022) Genome 

sequences of five Sitopsis species of Aegilops 

and the origin of polyploid wheat B subgenome. 

Mol Plant 15:488—503. https://doi.org/10.1016/j. 

molp.2021.12.019 

Lin G, Chen H, Tian B etal (2022) Cloning of the 
broadly effective wheat leaf rust resistance gene Lr42 
transferred from Aegilops tauschii. Nat Commun 
13:3044. https://doi.org/10.1038/s41467-022-30784-9 

Ling H-Q, Zhao S, Liu D etal (2013) Draft genome 
of the wheat A-genome progenitor Triticum 
urartu. Nature 496:87—90. https://doi.org/10.1038/ 
nature11997 

Ling H-Q, Ma B, Shi X et al (2018) Genome sequence 
of the progenitor of wheat A subgenome Triticum 


Li 


209 


urartu. Nature 557:424—428. https://doi.org/10.1038/ 
s41586-018-0108-0 

Liu W, Frick M, Huel R etal (2014) The stripe rust 
resistance gene Yr/O encodes an evolutionary- 
conserved and unique CC-NBS-LRR Sequence 
in wheat Mol Plant 7:1740-1755. https://doi. 
org/10.1093/mp/ssu112 

Lu P, Guo L, Wang Z et al (2020) A rare gain of function 
mutation in a wheat tandem Kinase confers resistance 
to powdery mildew. Nat Commun 11:680. https://doi. 
org/10.1038/s41467-020-14294-0 

Luo M-C, Gu YQ, Puiu D etal (2017) Genome 
sequence of the progenitor of the wheat D genome 
Aegilops tauschii. Nature 551:498—502. https://doi. 
org/10.1038/nature24486 

Luo M, Xie L, Chakraborty S etal (2021) A five- 
transgene cassette confers broad-spectrum resist- 
ance to a fungal rust pathogen in wheat. Nat 
Biotechnol 39:561-566. _ https://doi.org/10.1038/ 
s41587-020-00770-x 

Maccaferri M, Harris NS, Twardziok SO etal (2019) 
Durum wheat genome highlights past domestica- 
tion signatures and future improvement targets. 
Nat Genet 51:885-895.  https://doi.org/10.1038/ 
s41588-019-0381-3 

Mago R, Zhang P, Vautrin S etal (2015) The wheat 
Sr50 gene reveals rich diversity at a cereal disease 
resistance locus. Nat Plants 1:15186. https://doi. 
org/10.1038/nplants.2015.186 

Marchal C, Zhang J, Zhang P et al (2018) BED-domain- 
containing immune receptors confer diverse resist- 
ance spectra to yellow rust. Nat Plants 4:662—668. 
https://doi.org/10.1038/s41477-018-0236-4 

Mello-Sampayo T, Lorente R (1968) The role of chromo- 
some 3D in the regulation of meiotic pairing in hexa- 
ploid wheat. EWAC Newslett 2:16—24 

Moore JW, Herrera-Foessel S, Lan C etal (2015) A 
recently evolved hexose transporter variant confers 
resistance to multiple pathogens in wheat. Nat Genet 
47:1494—1498. https://doi.org/10.1038/ng.3439 

Olson EL, Brown-Guedira G, Marshall D et al (2010) 
Development of wheat lines having a small intro- 
gressed segment carrying stem rust resistance 
gene $722. Crop Sci 50:1823-1830. https://doi. 
org/10.2135/cropsci2009.11.0652 

Periyannan S, Moore J, Ayliffe M et al (2013) The gene 
$r33, an ortholog of barley Mla genes, encodes resist- 
ance to wheat stem rust race Ug99. Science 341:786— 
788. https://doi.org/10.1126/science.1239028 

Poddar S, Tanaka J, Running KLD, Kariyawasam 
GK, Faris JD, Friesen TL, Cho M-J, Cate 
JHD, Staskawicz B (2022) bioRxiv. https://doi. 
org/10.1101/2022.04.05.487229 

Poddar S, Tanaka J, Running KLD etal (2023) 
Optimization of highly efficient exogenous-DNA- 
free Cas9-ribonucleoprotein mediated gene editing 
in disease susceptibility loci in wheat (Triticum aes- 
tivum L.) Front Plant Sci 13:1084700. https://doi. 
org/10.3389/fpls.2022. 1084700 


210 


Qi LL, Echalier B, Chao S et al (2004) A chromosome 
bin map of 16,000 expressed sequence tag loci and 
distribution of genes among the three genomes of 
polyploid wheat. Genetics 168:701—712. https://doi. 
org/10.1534/genetics.104.034868 

Rawat N, Pumphrey MO, Liu S et al (2016) Wheat Fhb1 
encodes a chimeric lectin with agglutinin domains 
and a pore-forming toxin-like domain confer- 
ring resistance to Fusarium head blight. Nat Genet 
48:1576—1580. https://doi.org/10.1038/ng.3706 

Riley R, Chapman V (1958) Genetic control of the cyto- 
logically diploid behaviour of hexploid wheat. Nature 
182:713-715 

Saintenac C, Zhang W, Salcedo A etal (2013) 
Identification of wheat gene $735 that confers resist- 
ance to Ug99 stem rust race group. Science 341:783- 
786. https://doi.org/10.1126/science.1239022 

Saintenac C, Lee W-S, Cambon F etal (2018) Wheat 
receptor-kinase-like protein Stb6 controls gene-for- 
gene resistance to fungal pathogen Zymoseptoria trit- 
ici. Nat Genet 50:368—374. https://doi.org/10.1038/ 
s41588-018-0051-x 

Saintenac C, Cambon F, Aouini L et al (2021) A wheat 
cysteine-rich receptor-like kinase confers broad- 
spectrum resistance against Septoria tritici blotch. 
Nat Commun 12:433.  https://doi.org/10.1038/ 
s41467-020-20685-0 

Sánchez-Martín J, Steuernagel B, Ghosh S et al (2016) 
Rapid gene isolation in barley and wheat by mutant 
chromosome sequencing. Genome Biol 17:221. 
https://doi.org/10.1186/s13059-016-1082-1 

Sánchez-Martín J, Widrig V, Herren G etal (2021) 
Wheat Pm4 resistance to powdery mildew is con- 
trolled by alternative splice variants encoding chi- 
meric proteins. Nat Plants 7:327—341. https://doi. 
org/10.1038/s41477-021-00869-2 

Sato K, Abe F, Mascher M et al (2021) Chromosome- 
scale genome assembly of the transformation-ame- 
nable common wheat cultivar ‘Fielder. DNA Res 
28:dsab008. https://doi.org/10.1093/dnares/dsab008 

Savary S, Bregaglio S, Willocquet L et al (2017) Crop 
health and its global impacts on the components 
of food security. Food Sec 9:311—327. https://doi. 
0rg/10.1007/s12571-017-0659-1 

Savary S, Willocquet L, Pethybridge SJ etal (2019) 
The global burden of pathogens and pests on major 
food crops. Nat Ecol Evol 3:430-439. https://doi. 
org/10.1038/s41559-018-0793-y 

Schreiber AW, Hayden MJ, Forrest KL etal (2012) 
Transcriptome-scale homoeolog-specific transcript 
assemblies of bread wheat. BMC Genomics 13:492. 
https://doi.org/10.1186/1471-2164-13-492 

Sears ER (1954) The aneuploids of common wheat. Mo 
Agr Exp Sta Res Bull 572:1—59 

Sears ER (1966) Nullisomic-tetrasomic combinations 
in hexaploid wheat. In: Riley R, Lewis KR (eds) 
Chromosome manipulation and plant genetics. Oliver 
& Boyd, Edinburgh, pp 29-45 


K. L. D. Running and J. D. Faris 


Sears ER, Okamoto M (1958) Intergenomic chromosome 
relationships in hexaploid wheat. Proc Int Congr 
Genet 2:258-259 

Sears ER, Sears LMS (1978) The telocentric chromo- 
somes of common wheat In: Ramanujam S (ed) 
Proceedings of the 5th international wheat genetics 
symposium. Indian Society of Genetics and Plant 
Breeding, New Delhi, pp 389—407 

Shi G, Zhang Z, Friesen TL et al (2016) The hijacking 
of a receptor kinase-driven pathway by a wheat fun- 
gal pathogen leads to disease. Sci Adv 2:e1600822. 
https://doi.org/10.1126/sciadv. 1600822 

Singh SP, Hurni S, Ruinelli M et al (2018) Evolutionary 
divergence of the rye Pm17 and Pm8 resistance genes 
reveals ancient diversity. Plant Mol Biol 98:249-260. 
https://doi.org/10.1007/s11103-018-0780-3 

Srichumpa P, Brunner S, Keller B, Yahiaoui N (2005) 
Allelic series of four powdery mildew resistance 
genes at the Pm3 locus in hexaploid bread wheat. 
Plant Physiol 139:885—895. https://doi.org/10.1104/ 
pp.105.062406 

Steuernagel B, Periyannan SK, Hernández-Pinzón I et al 
(2016) Rapid cloning of disease-resistance genes in 
plants using mutagenesis and sequence capture. Nat 
Biotechnol | 34:652-655.  https://doi.org/10.1038/ 
nbt.3543 

Su Z, Bernardo A, Tian B et al (2019) A deletion muta- 
tion in TaHRC confers Fhbl resistance to Fusarium 
head blight in wheat. Nat Genet 51:1099-1105. 
https://doi.org/10.1038/s41588-019-0425-8 

The International Wheat Genome Sequencing 
Consortium (IWGSC), Mayer KFX, Rogers J etal 
(2014) A chromosome-based draft sequence of the 
hexaploid bread wheat (Triticum aestivum) genome. 


Science 345:1251788. https://doi.org/10.1126/ 
science. 1251788 
The International Wheat Genome Sequencing 


Consortium (IWGSC), Appels R, Eversole K et al. 
(2018) Shifting the limits in wheat research and 
breeding using a fully annotated reference genome. 
Science 361:eaar7191.  https://doi.org/10.1126/sci- 
ence.aar7191 

Thind AK, Wicker T, Simková H etal (2017) Rapid 
cloning of genes in hexaploid wheat using cultivar- 


specific long-range chromosome assembly. Nat 
Biotechnol 35:793-796. https://doi.org/10.1038/ 
nbt.3877 


Upadhyaya NM, Mago R, Panwar V etal (2021) 
Genomics accelerated isolation of a new stem 
rust avirulence gene—wheat resistance gene pair. 
Nat Plants 7:1220-1228. https://doi.org/10.1038/ 
s41477-021-00971-5 

Van de Weyer AL, Monteiro F, Furzer OJ, Nishimura 
MT, Cevik V, Witek K, Jones JDG, Dangl JL, Weigel 
D, Bemm F (2019) A species-wide inventory of 
NLR genes and alleles in Arabidopsis thaliana. Cell 
178(5):1260-1272.e14. https://doi.org/10.1016/]. 
cell.2019.07.038 


10 Rapid Cloning of Disease Resistance Genes in Wheat 


Várallyay É, Giczey G, Burgyán J (2012) Virus-induced 
gene silencing of Mlo genes induces powdery mil- 
dew resistance in Triticum aestivum. Arch Virol 
157:1345-1350 

Voichek Y, Weigel D (2020) Identifying genetic variants 
underlying phenotypic variation in plants without 
complete genomes. Nat Genet 52(5):534—540. https:// 
doi.org/10.1038/s41588-020-0612-7 

Walkowiak S, Gao L, Monat C etal (2020) Multiple 
wheat genomes reveal global variation in mod- 
ern breeding. Nature 588:277-283.  https://doi. 
0rg/10.1038/s41586-020-2961-x 

Wang Y, Cheng X, Shan Q, Zhang Y, Liu J, Gao C, Qiu 
J-L (2014) Simultaneous editing of three homoeo- 
alleles in hexaploid bread wheat confers heritable 
resistance to powdery mildew. Nat Biotechnol 32:947 

Wang H, Sun S, Ge W etal (2020a) Horizontal gene 
transfer of Fhb7 from fungus underlies Fusarium head 
blight resistance in wheat. Science 368:eaba5435. 
https://doi.org/10.1126/science.aba5435 

Wang H, Zou S, Li Y et al (2020b) An ankyrin-repeat and 
WRKY-domain-containing immune receptor confers 
stripe rust resistance in wheat. Nat Commun 11:1353. 
https://doi.org/10.1038/s41467-020-15139-6 

Wang L, Zhu T, Rodriguez JC etal (2021) Aegilops 
tauschii genome assembly Aet v5.0 features greater 
sequence contiguity and improved annotation. Genes 
Genom Genet 11:jkab325. https://doi.org/10.1093/ 
g3journal/jkab325 

Wicker T, Gundlach H, Spannagl M et al (2018) Impact 
of transposable elements on genome structure and 
evolution in bread wheat. Genome Biol 19:103. 
https://doi.org/10.1186/s13059-018-1479-0 

Wulff BB, Krattinger SG (2022) The long road to 
engineering durable disease resistance in wheat. 
Curr Opin Biotechnol 73:270-275.  https://doi. 
org/10.1016/j.copbio.2021.09.002 

Xie J, Guo G, Wang Y et al (2020) A rare single nucleo- 
tide variant in Pm5e confers powdery mildew resist- 
ance in common wheat. New Phytol 228:1011—1026. 
https://doi.org/10.1111/nph.16762 

Xing L, Hu P, Liu J et al (2018) Pm21 from Haynaldia 
villosa encodes a CC-NBS-LRR protein confer- 
rng powdery mildew resistance in wheat. Mol 
Plant 11:874-878. https://doi.org/10.1016/j. 
molp.2018.02.013 

Yahiaoui N, Srichumpa P, Dudler R, Keller B (2004) 
Genome analysis at different ploidy levels allows 
cloning of the powdery mildew resistance gene Pm3b 
from hexaploid wheat: Positional cloning of Pm3 
from hexaploid wheat. Plant J 37:528—538. https:// 
doi.org/10.1046/j.1365-313X.2003.01977.x 

Yan X, Li M, Zhang P et al (2021) High-temperature 
wheat leaf rust resistance gene Lr/3 exhibits pleio- 
tropic effects on hybrid necrosis. Mol Plant 14:1029— 
1032. https://doi.org/10.1016/j.molp.2021.05.009 

Yao E, Blake VC, Cooper L et al (2022) GrainGenes: a 
data-rich repository for small grains genetics and 
genomics. Database 2022:baac034.  https://doi. 
org/10.1093/database/baac034 


211 


Yu J, Pressoir G, Briggs W et al (2006) A unified mixed- 
model method for association mapping that accounts 
for multiple levels of relatedness. Nat Genet 38:203- 
208. https://doi.org/10.1038/ng1702 

Yu G, Matny O, Champouret N et al (2022) Aegilops 
sharonensis genome-assisted identification of stem 
rust resistance gene Sr62. Nat Commun 13:1607. 
https://doi.org/10.1038/s41467-022-29132-8 

Yuan C, Wu J, Yan B etal (2018) Remapping of the 
stripe rust resistance gene Yr/0 in common wheat. 
Theor Appl Genet 131:1253-1262. https://doi. 
org/10.1007/s00122-018-3075-9 

Zhang W, Chen S, Abate Z et al (2017) Identification and 
characterization of Sr/3, a tetraploid wheat gene that 
confers resistance to the Ug99 stem rust race group. 
Proc Natl Acad Sci USA 114. https://doi.org/10.1073/ 
pnas.1706277114 

Zhang C, Huang L, Zhang H etal (2019) An ances- 
tral NB-LRR with duplicated 3'UTRs confers stripe 
rust resistance in wheat and barley. Nat Commun 
10:4023. https://doi.org/10.1038/s41467-019-11872-9 

Zhang J, Hewitt TC, Boshoff WHP et al (2021a) A recom- 
bined $726 and Sr6/ disease resistance gene stack in 
wheat encodes unrelated NLR genes. Nat Commun 
12:3378. https://doi.org/10.1038/s41467-021-23738-0 

Zhang Z, Running KLD, Seneviratne S et al (2021b) A 
protein kinase—major sperm protein gene hijacked by 
a necrotrophic fungal pathogen triggers disease sus- 
ceptibility in wheat. Plant J 106:720—732. https://doi. 
org/10.1111/tpj.15194 

Zhang L, Liu Y, Wang Q etal (2022a) An alternative 
splicing isoform of wheat TaYRG/ resistance pro- 
tein activates immunity by interacting with dynamin- 
related proteins. J Exp Bot erac245 https://doi. 
org/10.1093/jxb/erac245 

Zhang X, Wang G, Qu X et al (2022b) A truncated CC-NB- 
ARC gene TaRPPI3LI-3D positively regulates pow- 
dery mildew resistance in wheat via the RanGAP-WPP 
complex-mediated nucleocytoplasmic shuttle. Planta 
255:60. https://doi.org/10.1007/s00425-022-03843-0 

Zhao G, Zou C, Li K et al (2017) The Aegilops tauschii 
genome reveals multiple impacts of transposons. 
Nat Plants 3:946-955.  https://doi.org/10.1038/ 
s41477-017-0067-8 

Zhou Y, Bai S, Li H etal (2021) Introgressing the 
Aegilops tauschii genome into wheat as a basis for 
cereal improvement. Nat Plants 7:774—786. https:// 
doi.org/10.1038/s41477-021-00934-w 

Zhu T, Wang L, Rodriguez JC etal (2019) Improved 
genome sequence of wild emmer wheat Zavitan with 
the aid of optical maps. Genes Genom Genet 9:619— 
624. https://doi.org/10.1534/g3.118.200902 

Zhu T, Wang L, Rimbert H et al (2021) Optical maps 
refine the bread wheat Triticum aestivum cv. Chinese 
Spring genome assembly. Plant J 37:528—538. https:// 
doi.org/10.1046/j.1365-313X.2003.01977.x 

Zimin AV, Puiu D, Hall R et al (2017a) The first near- 
complete assembly of the hexaploid bread wheat 
genome, Triticum aestivum. GigaScience 6. https:// 
doi.org/10.1093/gigascience/gix097 


212 K.L.D. Running and J. D. Faris 


Zimin AV, Puiu D, Luo M-C et al (2017b) Hybrid assem- Zou S, Wang H, Li Y et al (2018) The NB-LRR gene 


bly of the large and highly repetitive genome of Pm60 confers powdery mildew resistance in wheat. 
Aegilops tauschii, a progenitor of bread wheat, with New Phytol 218:298—309. https://doi.org/10.1111/ 
the MaSuRCA mega-reads algorithm. Genome Res nph. 14964 


27:787-792. https://doi.org/10.1101/gr.213405.116 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License 
(http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in 
any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to 
the Creative Commons license and indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter's Creative Commons license, 
unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative 
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you 
will need to obtain permission directly from the copyright holder. 


Genomic Insights on Global 
Journeys of Adaptive Wheat 


Genes that Brought Us 
to Modern Wheat 


Deepmala Sehoal, Laura Dixon, Diego Pequeno, 
Jessica Hyles, Indi Lacey, Jose Crossa, Alison Bentley 
and Susanne Dreisigacker 


Abstract 


Since its first cultivation, hexaploid wheat has 
evolved, allowing for its widespread cultiva- 
tion and contributing to global food security. 
The identification of adaptive genes, such 
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as vernalization and photoperiod response 
genes, has played a crucial role in optimiz- 
ing wheat production, being instrumental 
in fine-tuning flowering and reproductive 
cycles in response to changing climates and 
evolving agricultural practices. While these 
adaptive genes have expanded the range 
of variation suitable for adaptation, further 
research is needed to understand their mech- 
anisms, dissect the pathways involved, and 
expedite their implementation in breeding 
programs. By analyzing data across differ- 
ent environments and over time, Meta-QTL 
analysis can help identify novel genomic 
regions and facilitate the discovery of new 
candidate genes. This chapter reports on two 
previously unknown Meta-QTL regions, 
highlighting the potential for further explora- 
tion in this field. Moving forward, it will be 
increasingly important to expand our under- 
standing of how genetic regions influence 
not only flowering time but also other devel- 
opmental traits and their responses to envi- 
ronmental factors. Advances in gene-based 
modeling hold promise for describing growth 
and development processes using QTL and 
other genomic loci analysis. Integrating these 
findings into process-based crop models can 
provide valuable insights for future research. 
Overall, the study of adaptive genes and their 
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impact on wheat production represents a vital 
area of research that continues to contribute 
to global food security. 


Keywords 
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11.1 Historical Perspective 

Archaeological evidence suggests that hexaploid 
wheat was first cultivated in the Fertile Crescent 
of the Middle East around 7000 BC and that 
farming spread to Europe (former Yugoslavia, 
Bulgaria, Greece) approximately one thousand 
years later (Hillman 1972; Renfrew 1973). By 
approximately 4000 BC, wheat production had 
reached China, with archeological isotope anal- 
ysis suggesting diets shifted from a dominance 
of C4 crop millet, to C3 cereals including wheat 
(Li et al. 2007; Cheung et al. 2019). The coinci- 
dence of changing climate, whereby conditions 
became colder and drier, is proposed to have led 
to the adoption of wheat due to its greater flex- 
ibility in sowing time to achieve yield (Cheung 
etal. 2019). The ancient Greek poet Hesiod 
described an awareness of the importance of the 
seasonal timing of wheat development as early 
as 800 BC in Greece (Aitken 1974), and approx- 
imately two thousand years later, French scien- 
tist Réaumur constructed a thermometer and 
showed that crop maturity was influenced by 
temperature (Réaumur 1735). In 1751 Carl von 
Linné published a floral calendar in Philosophia 
Botanica, observing that plant responses to the 
environment varied in different climates (Linné 
and Freer 2007), and since this time, multiple 
evidence of variation in flowering time due to 
temperature, daylength, and latitude has been 
reported (Aitken 1974). From the eighteenth 
century, bread wheat (Triticum aestivum L.) 
has grown on all continents except Antarctica, 
and to ensure successful cultivation in different 
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environments, wheat breeding (hybridization 
and selection to achieve adaptation) began. 


11.1.1 Early Breeding and Selection 
for Seasonal Adaptation 


In France in 1743, the seed merchant Jeanne 
Claude Geoffroy and botanist Pierre d’ Andrieux 
founded a seed company which began the 
Vilmorin family dynasty of wheat breeding 
that lasted more than 200 years. There is evi- 
dence that pedigree-based breeding was used at 
the Vilmorin company from 1840, with selec- 
tion of seeds based on evaluation of progeny 
performance (Gayon and Zallen 1998). Henry 
de Vilmorin described the importance of wheat 
adaptation in Les meillieurs bles (“The best 
wheat”) which illustrated the morphology, ori- 
gin, adaptation, and best agronomic practice for 
different varieties. His astute preface included, 
“one of the best ways to increase harvests with- 
out increasing expenditure is to cultivate the 
breeds of wheat which are best suited to the cir- 
cumstances in which the land is cultivated” and to 
“choose knowingly the most advantageous wheat 
in each locality” (Vilmorin 1880). Vilmorin had 
begun hybridization experiments in 1873, includ- 
ing the use of wheat which had been selected by 
Scottish agriculturalist Patrick Shirreff (Vilmorin 
1880). Vilmorin’s first variety DATTEL released 
10 years later was the result of crossing an early 
maturing, short stature type from France with 
late maturing English wheat. DATTEL became 
widely adopted, as resistance to lodging and 
earliness created a uniform crop with high yield 
potential. A string of successful cultivars followed 
including VILMORIN 23, VILMORIN 27, and 
VILMORIN 29 and many others which feature in 
the ancestry of modern wheat (Lupton 1987). 

At approximately the same time another 
European breeder, Wilhelm Rimpau was also 
crossing native types to English Squarehead 
wheat for improved yield, using North American 
varieties as donors of quality and winter hardi- 
ness. His most successful cultivar RIMPAU’S 
FRUHER BASTARD was the most widely 
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grown in Germany for over 50 years after 
being released in 1889 (Porsche and Taylor 
2001). That same year, pioneer breeder William 
Farrer made his first wheat crosses in Australia. 
Farrer also focused on introgression of wheat 
to improve quality and adaptation. He crossed 
European Purple Straw with Canadian Fife and 
Indian wheat, and the resulting early maturing 
cultivars were successful in Australia because 
the short life cycle avoided water-limiting con- 
ditions in summer and escaped rust infection. 
Farrer cultivars went on to dominate Australian 
wheat production in the early 1900s, on the 
basis that “He recognized that the characteris- 
tics of a variety limited its successful growth to 
certain localities, and therefore set himself the 
task of breeding varieties adapted to the differ- 
ent conditions” (Guthrie 1922). 

The Canadian “hard” wheat FIFE used 
for crossing by Farrer created cultivars with 
increased dough strength relative to the soft white 
wheats traditionally used for baking. Initially, 
Farrer wheat was met with resistance from mill- 
ers. From the Rust in Wheat Conference in 
Melbourne, 1896, “A prominent obstacle this 
Conference has met with has arisen from the 
objection of millers. The opinion this Conference 
has long held is that the opposition of millers to 
such wheats has no legitimate foundation but 
arises from either misconception or from conserv- 
atism” (Guthrie 1922). Australian millers realized 
the superior quality of Farrer wheat only when 
American wheat of the same type was imported 
to Australia to meet local demand (Guthrie 1922). 

Other breeders were also crossing Canadian 
FIFE and INDIAN wheat. In Canada, Percy 
Saunders crossed RED FIFE and HARD 
RED CALCUTTA, and the resulting culti- 
var MARQUIS was selected and released by 
Charles Saunders in 1908. With excellent qual- 
ity and adaptation through early maturity, 
MARQUIS dominated Canadian production 
and became the gold standard for quality classi- 
fication (Lupton 1987; McCallum and DePauw 
2008). The overwhelming popularity of 
MARQUIS (and two later releases, THATCHER 
and NEEPAWA) highlights a negative 


215 


consequence if few adapted cultivars are widely 
used, that is, a decline of genetic diversity in the 
breeding pool over time (Fu and Dong 2015). 


11.1.2 Expanding Knowledge 
of Seasonal Patterns 


Fluctuating patterns of seasonal flowering time 
have been well documented across plant species 
(Andrés and Coupland 2012). Early work by 
Garner and Allard (1920) described the relation- 
ship between daily light duration, plant growth, 
and reproduction across several plant species. 
Their work demonstrated that daily light dura- 
tion impacted both the rate and extent of growth 
as well as the time to reach and complete flower- 
ing and reproduction (Garner and Allard 1920). 
In wheat, Chinoy (1950) demonstrated that 
long days induced earlier onset of the reproduc- 
tive phase with both cold (vernalization) and 
light (photoperiod) having measurable impact 
on development and growth. It was proposed 
that the first wheat (which were domesticated 
in the Fertile Crescent) shared both vernaliza- 
tion requirements and photoperiod responses 
with their progenitors, but that selection for 
alternative adaptation facilitated the spread of 
wheat throughout Europe, and then worldwide 
(reviewed by Cockram et al. 2007). 

The detection of major genes controlling 
vernalization (positive vernalization response 
from wheat variety INSIGNIA 49 (Pugsley 
1963)) and photoperiod (from Canadian variety 
SELKIRK (Pugsley 1965)) was demonstrated in 
segregating populations and provided evidence 
for simple inheritance. This offered the opportu- 
nity to apply selection for daylength specificity 
(Pugsley 1965), although additional genetic con- 
trollers were hypothesized. Genes for daylength 
duration, Photoperiod-1 (PPD1), were mapped 
to the homoeologous group 2 chromosomes 
(Law et al. 1978) and studies by Martinic (1975) 
and Hunt (1979) demonstrated the prevalence of 
photoperiod-sensitive winter wheat in northern 
latitudes and photoperiod insensitivity in south- 
ern Europe. 
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Creation of near-isogenic wheat lines captur- 
ing PPD1 variation by Worland and Law (1986) 
and Worland etal. (1998) confirmed genetic 
effects and allowed the understanding of their 
environmental performance throughout Europe. 
This demonstrated a yield disadvantage from 
earlier flowering in the UK, a moderate advan- 
tage in Germany, and a significant advantage in 
southern Europe (based on testing in the former 
Yugoslavia). These effects have been further 
elaborated with Borner et al. (1993) confirming 
that middle European varieties benefit from day- 
length sensitivity (conferred by PPD/), whereas 
insensitivity offers productivity-related increases 
where wheat experiences hot and dry summer 
conditions. Hoogendoorn (1985) assessed phe- 
notypic response to photoperiod and vernaliza- 
tion in a collection of 33 wheat varieties from a 
range of geographies. This confirmed a preva- 
lence of photoperiod sensitivity in varieties from 
Europe and North America and insensitivity in 
varieties originating from Mexico, India, and 
Australia. 

Since the development of understanding the 
major controllers of photoperiod response and 
vernalization requirements in wheat, variation 
for the PPD1 and Vernalization-1 (VRN1) genes 
has aided wheat’s adaptation to a wide range 
of global production environments (Sheehan 
and Bentley 2020). In many geographies, there 
is a documented progression of wheat cultiva- 
tion across climatic features and areas includ- 
ing in North America (Olmstead and Rhode 
2011), Asia, the Mediterranean, North Africa 
(Ortiz Ferrara et al. 1998), and China (Yang et al 
2009). 


11.1.3 Further Adaptive Progress 
Through Time 


Farrer was first to target early maturity to breed 
adapted wheat for Australia, although the intro- 
duction of additional genetic diversity for phe- 
nology had been identified (Eagles et al. 2009). 
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In 1945, Australian breeder Walter Lawry 
Waterhouse introgressed hexaploid wheat and 
an early maturing durum wheat, GAZA, pro- 
ducing an important cultivar, GABO. This 
daylength insensitive wheat was the leading cul- 
tivar in Australia for many years and sister line 
TIMSTEIN was also successfully cultivated 
in USA. This germplasm was utilized by the 
International Maize and Wheat Improvement 
Center (CIMMYT) in the breeding of culti- 
vars such as CAJEME, MAYO, and NAINARI 
(Watson and Frankel 1972). 

Other donors of photoperiod insensitiv- 
ity have been traced to Japanese landrace 
AKAGOMUGHI (which also carried dwarf- 
ing gene, RHT8) and Chinese landraces 
MAZHAMAI and YOUZIMAI (Yang etal. 
2009). Yang etal. (2009) showed that the dis- 
tribution of alleles for daylength sensitivity 
depended upon the climate (temperature and 
latitude) where the wheat was cultivated. The 
adoption of photoperiod insensitive wheat by 
CIMMYT was key to the success of the shut- 
tle-breeding program, whereby material could 
undergo selection in multiple environments due 
to broad adaptation (Trethowan et al. 2007). It 
is the subsequent sharing of germplasm during 
the “Green Revolution" which likely facilitated 
the spread of alleles for daylength insensitivity 
around the globe. 

As the climate changes over time and new 
crop management practices are developed, it 
is probable that new genetic variation will be 
required for enhancing adaptation (Hunt et al. 
2019). For instance, studies have highlighted a 
shift to early sowing which has meant that ver- 
nalization responsive, long season wheat are 
beneficial in some areas of southern Australia 
where spring types are traditionally cultivated 
(Hunt 2017; Cann etal. 2020). To expedite 
development of future adapted wheat cultivars, 
it is important to understand the genetic archi- 
tecture of phenology and develop breeding tools 
such as molecular markers and simulation mod- 
els for prediction. 
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11.2 Understanding the Genetic 
Control of the Synchrony 
of Flowering 


11.2.1 The Three Known Gene 
Systems 


As outlined above, within less than 
10,000 years, wheat cultivation has expanded 
from its primary area of evolution within the 
Fertile Crescent to a broad spectrum of agro- 
ecology around the globe, adapting rapidly to a 
wide range of climatic conditions (Curtis 2002; 
Salamini etal. 2002). The essential path to 
achieve adaptation is the synchrony of flowering 
which in wheat is controlled by three major gene 
systems: (1) the VRN genes (exposure to cold 
temperature), (2) the PPD response genes (sen- 
sitivity to daylength), and (3) the autonomous 
earliness per se (EPS) genes (Kato and Yanagata 
1988). The adaptation of a wheat genotype to a 
particular environment depends to a large extent 
on the interaction of these three systems. 


11.2.2 Vernalization (VRN) Genes 


Vernalization is the acquisition of a plant's 
ability to flower by exposure to cold (Chouard 
1960). According to the vernalization require- 
ment of a genotype, wheat is classified as hav- 
ing a winter or spring growth habit. Winter 
wheat has a considerable vernalization require- 
ment, but spring wheat may be insensitive or 
only partly sensitive to vernalization (Trevaskis 
et al. 2003). The key element of the vernaliza- 
tion gene system is VRN/ with its three ortholo- 
gous genes (VRN-Al, VRN-Bl, and VRN-D1) 
located on the long arms of chromosomes 5A, 
5B, and 5D, respectively (Figs. 11.1 and 11.2a). 
VRNI is a member of the MADS-box tran- 
scription factor family, which has been shown 
to play a critical role in flowering gene models 
across crops (Zhao et al. 2006). The MIKC-type 
MADS-box proteins have a highly conserved 
MADS DNA-binding domain, an interven- 
ing (D domain, a keratin-like (K) domain, and 
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a C-terminal domain (C). The proteins bind as 
dimers to DNA sequences named *CArG" boxes 
and organize in tetrameric complexes (Li et al. 
2019). The multimeric nature of these com- 
plexes generates many combinatorial possibili- 
ties with different targets and functions (Li et al. 
2019; Honma and Goto 2001; TheiDen et al. 
2016). 

Mutations in the promoter and deletions in 
the large first intron of VRNI are associated with 
increased expression of the genes in the absence 
of cold, accelerated flowering without vernaliza- 
tion and thus spring growth habit (Kippes et al. 
2018). Additionally, single nucleotide polymor- 
phisms (SNPs) in exons 4 and 7 have been iden- 
tified in VRN-AI (Eagles et al. 2011; Muterko 
and Salina 2018). The exon 4 SNP results in 
an amino acid change (Leull7 — Phel17) in 
the conserved k-domain (Eagles etal. 2011; 
Chen et al. 2009; Díaz et al. 2012). This poly- 
morphism was associated with a change in the 
number of days to stem elongation, vernaliza- 
tion requirement duration, frost tolerance, and 
flowering time in winter wheat (Chen etal. 
2009; Muterko et al. 2016; Dixon et al. 2019). 
Another VRN-A/ SNP that causes an amino acid 
substitution (Ala180 — Val180) in exon 7 in the 
C-terminal domain also regulates vernalization 
duration, via its regulation of a protein interac- 
tion with TaHOX] (Li et al. 2013). 

Beyond regulation by alterations in nucleo- 
tide sequence (INDELS and SNPs), there is 
increasing evidence that vernalization in wheat 
is also regulated at the epigenetic level. The 
VRN-Al gene can be present as two or more 
copies with the assumption that the number of 
copies positively correlates with the vernaliza- 
tion requirement duration and flowering time 
of wheat (Díaz et al. 2012). The different nature 
of the diverse mutations (promoter insertions, 
intron deletions of different size, SNPs) in the 
three VRNI orthologs and gene duplication in 
the A genome are the most plausible explana- 
tion for varying gene actions observed (Li et al. 
2013). Dominant alleles at VRN-A7 have been 
shown to confer the largest effects leading to 
a lack of vernalization requirement relative to 
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Fig. 11.1 Major flowering genes involved in photo- 
period and vernalization response. The major genes 
involved with photoperiod and vernalization responses 
in wheat are highlighted in different colors. For each 
gene, known allelic variation is included and the effect 
of this variation on the level of expression or the flow- 
ering response is shown. The structure of each gene, 
along with the annotated domains, is represented on a 
gray background bar. Where interactions with uncharac- 
terized QTL regions are known, these are also included 
on the network diagram. Deletions are indicated by a red 
oval with a line through it. Different VRN/ alleles can 
determine the extent to which vernalization is required 
to increase expression, due to CArG box, VRT2 inter- 
actions, and exon variants, including changes in copy 


VRN-B1 and VRN-D1, which reduced vernaliza- 
tion requirement and defined semi-spring or fac- 
ultative types (Trevaskis et al. 2003). 

Other MADS-box genes also play a role in 
the regulation of wheat flowering. VRN-D4 is a 
MADS-box transcription factor derived from 
the duplication and translocation of the VRN-A7 
gene to the short arm of chromosome 5D (Kippes 
et al. 2015). Being an extra gene copy, VRN-D4 
is associated with increased VRN-A] expres- 
sion and thus reduced vernalization require- 
ment. The VEGETATIVE TO REPRODUCTIVE 
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number. A duplication and translocation of VRNI, in the 
form of VRN-D4, also promote spring habit. The locus 
VRN2 (ZCCTI and ZCCT2) is a photoperiod-dependent 
repressor of VRN/, competing with CONSTANS and the 
nuclear transcription factor Y (NF-Y) proteins to activate 
FTI (also called VRN3) and potentially FT2. The FT 
genes interact with FD-like genes (FDL2 or FDL12) to 
form a floral activating complex. Copy number variants, 
most notably of VRNI and PPDI, can determine heading 
date. PPD1 determines flowering time through photoper- 
iod sensitivity with variations in promoter deletions and 
copy number influencing expression levels. Short-day 
promotion of flowering is mediated through FT3 (also 
called PPD-2) 


TRANSITION 2 (VRT2) gene belongs to the 
group of MADS-genes as SHORT VEGETATIVE 
PHASE in Arabidopsis and interacts with VRN1. 
The VRT2 protein has been shown to bind to the 
CArG box in the VRNI promoter region, sug- 
gesting that VRT2 represses the transcription of 
VRNI (Dubcovsky et al. 2008; Kane et al. 2007). 
More recently, Xie etal. (2021) corroborated 
an epistatic interaction between the two genes 
(including the ability of VRT2 to bind to the pro- 
moter region of VRN/), but reported a shared 
upregulation of VRNI and VRT2. 
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Fig. 11.2. Impact of major flowering genes responses 
to different temperatures, in the context of vernaliza- 
tion. This figure represents the role of temperature in the 
regulation of vegetative to reproductive meristem transi- 
tion, and how this relates to the vernalization pathway. 
To indicate the different aspects of the flowering pathway 
and how responses which occur are more influenced by 
specific environmental conditions the pathway has been 


In addition to VRN/J, the VRN2 and VRN3 
genes are located on the long arm of chromo- 
some 5A and the short arm of chromosome 
7B, respectively (Figs. 11.1 and 11.2) The 
VRN2 locus consists of two closely related 
genes (ZCCTI and ZCCT2) that encode pro- 
teins carrying a putative zinc finger and a CCT 
domain (Yan et al. 2004). The CCT domain is a 
43-amino acid region, first described in protein 
sequences of CONSTANS (CO), CONSTANS- 
like (COL), and TIMING OF CABI (TOCI) 
(Putterill et al. 1995; Strayer et al. 2000; Robson 
etal. 2001) that is present in multiple regula- 
tory proteins associated with light signaling, 
circadian rhythms, and photoperiodic flowering 
(Wenkel et al. 2006). VRN2 is the major flower- 
ing repressor identified in wheat. Dominant gene 
action in combination with recessive VRN1 and 
VRN3 allele combinations confers winter wheat 
growth habit. Deletions or mutations involv- 
ing positively charged amino acids at the CCT 
domain are associated with recessive ZCCTI 
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separated into a low temperatures and b high tempera- 
ture (post or non-requiring vernalization), although it 
must be emphasized that each aspect does not act inde- 
pendently. The same gene structure and nomenclature are 
used as for Fig. 11.1. Additionally, the weighting of each 
gene signal is indicated in a seesaw schematic (see also 
Fig. 11.4) 


and ZCCT2 alleles for spring growth habit (Yan 
etal. 2004; Dubcovsky etal. 2005; Distelfeld 
etal. 2009). The CCT domains in ZCCTI, 
ZCCT?2, and CO proteins further interact with 
proteins of the NUCLEAR FACTOR-Y (NF- 
Y) transcription factor family. Mutations in the 
CCT domain of ZCCT proteins also reduce the 
strength of ZCCT-NF-Y interactions and the 
ability of ZCCT1 to compete with CO to acti- 
vate VRN3 (Figs. 11.1 and 11.2b). 

The VRN3 gene encodes a RAF kinase 
inhibitor-like protein and has been mapped to 
the FLOWERING LOCUS T-like gene, often 
referred to as FTI in wheat. VRN3/FTI is 
expressed in long days in vernalized plants or 
spring types and thus triggers long-day-induced 
flowering. The VRN3/FT1 protein has been 
shown to travel through the phloem carrying 
the photoperiodic signal from the leaves to the 
shoot apex where it forms a protein complex 
binding to the promoter of VRNI, promoting 
its further expression. Dubcovsky et al. (2008) 
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Fig.11.3 Impact of major flowering genes responses to 
different daylengths. The role of daylength is represented 
in the regulation of vegetative to reproductive meristem 
transition. To indicate the different aspects of the flow- 
ering pathway and how responses which occur are more 
influenced by specific environmental conditions, the 


demonstrated interaction of Vrn3/FTI with the 
FT2, FDL2, and FDL13 proteins. Transgenic 
plants showed that increased transcript lev- 
els of FT2 (a FT paralogue) provide transcrip- 
tional activation of VRN1. VRN3/FTI therefore 
integrates the vernalization and photoperiod 
response gene systems. High levels of VRN3/ 
FTI expression can overcome the vernaliza- 
tion requirement and are associated with spring 
growth habit (Figs. 11.1 and 11.2b) (Yan et al. 
2006). 


11.2.3 Photoperiod (PPD) Response 
Genes 


Photoperiod genes promote the floral transition 
in response to long days (Searle and Coupland 
2004). Photoperiod-sensitive wheat has a 
long-day phenotype. They flower earlier when 
the days are longer than a critical threshold. 
Photoperiod-insensitive wheat flowers largely 
independently of daylength and can be grown 
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pathway has been separated into a long-day, b short- 
day, although it must be emphasized that each aspect 
does not act independently. The same gene structure and 
nomenclature are used as for Fig. 11.1. Additionally, the 
weighting of each gene signal is indicated in a seesaw 
schematic (see also Fig. 11.4) 


to maturity in long- or short-day environments. 
Photoperiod response is mainly controlled by 
the semi-dominant homoeologous PPD/ genes 
on the short arm of chromosome group 2 (Law 
etal. 1978; Welsh et al. 1973). PPDI belongs 
to a pseudo-response regulator (PRR) gene fam- 
ily, which is characterized by a pseudo-receiver 
domain near the amino-terminus and a 43 amino 
acid CCT domain near the carboxy-terminus 
of the protein (Mizuno and Nakamichi 2005). 
Wild-type alleles of PPD1 (PPD-Ib) have a 
rhythmic diurnal pattern of rather low gene 
expression and are associated with daylength 
sensitivity (Figs. 11.1 and 11.3). Non-wild-type 
alleles of PPDI (PPD-1a) alter the expression 
of the gene, leading to elevated transcription 
throughout the day, and accelerated flower- 
ing through elevated FT] expression (Kitagawa 
etal. 2012). This can substitute for long days 
and reduce daylength sensitivity. 

Several non-wild-type, photoperiod-insen- 
sitive alleles are known for PPD/. At the PPD- 
D1 locus, a 2 kb deletion upstream of the coding 
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region of the gene confers photoperiod insen- 
sitivity of semi-dominant type (Beales et al. 
2007). This mutation has been recognized as 
the major source of earliness in wheat varieties 
worldwide. Tanio and Kato (2007) described 
a PPD-Bla mutation from the Japanese culti- 
var FUKUWASEKOMUGI and Nishida et al. 
(2013) characterized a Ppd-B1a allele (a 308 bp 
insertion in the 5'-upstream region) derived 
from the Japanese landrace “SHIROBOR21”. 
No genetic locus for PPD-A1 has been defined 
in hexaploid wheat. However, Wilhelm et al. 
(2009) described two photoperiod-insensitive 
alleles from tetraploid wheat: “GS-100” PPD- 
Ala and “GS-105” PPD-Ala. These alleles have 
deletions of 1027 bp (“GS-100”) and 1117 bp 
(*GS-105") in a similar region of the upstream 
promoter to PPD-Dla. 

Nishida et al. (2013) described a PPD-Ala 
mutation (1085 bp deletion in the 5'-upstream 
region) in the Japanese hexaploid wheat cultivar 
CHIHOKUKOMOGI which is in a similar loca- 
tion to the deletions described by Wilhelm et al. 
(2009) but appears to be unique to Japanese 
wheat. In addition to photoperiod-insensitive 
mutations, Beales etal. (2007) identified can- 
didate null alleles for PPD-A/ and PPD-D1 in 
photoperiod-sensitive cultivars. The loss of func- 
tion alleles delays flowering time associated 
with reduced expression of FT], similar to the 
wild-type alleles (Shaw et al. 2013). 

Similar to VRNI, there is also variation 
among the potencies of the three PPD-Ja loci, 
where plants with PPD-Ala and PPD-Dla are 
earlier in flowering than plants with PPD-Bla. 
Diaz et al. (2012) and Würschum et al. (2015) 
showed that alleles of PPD-B1 were associated 
with increased copy number resulting in earlier 
flowering. These results, along with multiple 
copies of VRN1, confirm that copy-number vari- 
ation is important for the adaptation of wheat. 

More recently, three candidate genes for 
PPD2 and PPD3 (also designated as FT3-Bl, 
FT3-D1, and TOE-B1) controlling short-day 
flowering pathway were identified on the long 
arm of chromosome group | in wheat (Zikhali 
et al. 2014, 2017; Halliwell et al. 2016). Four 
variations were observed for FT3-B/ including 


the wild-type allele, a complete deletion of the 
gene, a SNP in the exon 3 causing an amino acid 
change (Gly — Ser), and copy-number vari- 
ants. Both the deleted and mutated alleles confer 
delayed flowering under short-day photoperiod 
(Figs. 11.1 and 11.3). At the FT3-Dl locus, 
a SNP in exon 4 was identified. The candidate 
gene for PPD3, TOE-Bl is still speculative. 
SNPs in exons | and 9 of the TOE/-B/ gene 
were shown to separate earlier flowering from 
later flowering cultivars suggesting the gene to 
be a putative flowering time repressor, while the 
mutant allele is expected to attribute earliness 
(Zikhali et al. 2017). A summary of the role of 
each gene on the different environmental signals 
on floral meristem development is again summa- 
rized in Fig. 11.4. 


11.2.4 Earliness Per Se (EPS) Genes 


The photoperiod and vernalization gene sys- 
tems allow the coarse tuning of adaptation. 
However, there are still relatively minor vari- 
ations in flowering time once requirements of 
vernalization and photoperiod are totally satis- 
fied. These differences are regulated by earli- 
ness per se (EPS) genes, usually of small effect 
but critical for fine-tuning developmental phases 
in the crop cycle (Zikhali and Griffiths 2015; 
Griffiths et al. 2009). The genetics of EPS is 
still not well understood, and underlying genes 
with causal polymorphisms have only recently 
been identified in hexaploid wheat (Zikhali et al. 
2017). In wild species Triticum monococcum 
L., a cereal ortholog of Arabidopsis thaliana 
circadian clock regulator LUX ARRHYTHMO/ 
PHYTOCLOCK 1 (LUX/PCLI1) was proposed 
as a promising candidate gene for the earliness 
per se 3 (Eps-3A") locus and the ortholog cir- 
cadian clock regulator EARLY FLOWERING 3 
(ELF3) was identified as a candidate gene for 
the earliness per se Eps-A"] locus (Gawroński 
et al. 2014; Alvarez et al. 2016). ELF3 was sug- 
gested to be the best candidate gene within the 
EPS-D1 locus in hexaploid wheat as a deletion 
containing ELF3 is associated with advanced 
flowering (Zikhali etal. 2016). Recently, two 
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Fig. 11.4 Summary of the roles of different envi- 
ronmental signals on floral meristem development. 
Different environmental signals are used by plants to 
regulate the timing and rate of floral meristem develop- 
ment. The relative proportions of the expressed genes are 
shown in the seesaw summary figures, and the impact 
of these expression patterns on the floral developmental 
stage is indicated by the tipping of the seesaw balance. 
Environmental conditions which are considered are a 
low temperatures, b high temperatures (post or non- 
requiring vernalization), c long day, and d short day 


additional EPS QTL in hexaploid wheat located 
on chromosomes 2B and 7D with the designated 
names EPS-B2 and EPS-D7 were identified 
(Basavaraddi et al. 20212), and for the first time, 
interaction between both genes could be shown. 
EPS genes owe their name to the assump- 
tion that they act independent of environment. 
Despite this, Eps x temperature interaction was 
recently proven in some instances (Ochagavía 
et al. 2019; Prieto et al. 2020; Basavaraddi et al. 
2021b). In barley, the EPS gene ELF3 has been 
shown to play a role in the response of circadian 
clock genes to temperature (Ford et al. 2016). 
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11.3 Quantitative Trait Loci (QTL) 
for Flowering Time 


The selection of spring and photoperiod-insen- 
sitive type cultivars during the evolutionary and 
breeding history of wheat preceded any methods 
for gene identification. However, the identifica- 
tion of the causal genes in the last two decades 
was important to enable targeted selection and 
therefore the potential for the directed develop- 
ment of new cultivars. One of the major meth- 
ods utilized in the process of genetic mapping 
and gene identification is quantitative trait loci 
(QTL) analysis. QTL analysis is a powerful 
statistical tool used to calculate the probability 
of any marker within a genetic map contribut- 
ing to the observed phenotype. The resolution 
and reliability of this method is increased via 
larger mapping populations, as these support 
higher levels of recombination. The resolution 
is also increased through an even distribution 
of markers in the genetic map; however, this is 
dependent on polymorphisms between the par- 
ent genotypes and can be severely limited when 
diversity is low. This is regularly observed for 
the D-genome of wheat or when mapping pop- 
ulations are generated between cultivars with a 
recent shared pedigree. Individual QTL analy- 
sis to identify flowering time genes has been 
conducted for a vast number of mapping popu- 
lations under a large and diverse set of environ- 
mental conditions (for details refer to the next 
section). These have identified certain genetic 
hot spots for flowering time regulation, includ- 
ing the regions of major genes previously men- 
tioned, e.g., on chromosomes 5A (VRN-AI) 
along with 7B (VRN-B3) and 2D (PPD-DI). 
Within these hot spot regions, it is apparent that 
multiple genes which regulate flowering time 
are closely genetically associated. The indica- 
tion of these genetic hubs, combined with the 
dominance of the PPD/ and VRNI genes in 
flowering regulation, suggests that there could 
be value in assessing the identified QTL for 
flowering through a meta-QTL (MQTL) analy- 
sis. This analysis would identify the number and 
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Fig. 11.5 QTL for flowering time-related traits and known major genes projected on the IWGSC RefSeq v1.0. The 
QTL are shown from short arm (top) to long arm (bottom). Centromeres are presented by blue ovals 


genetic range of QTL beyond PPD1 and VRNI 
and in combination with location information 
infer some climate-based associations for these 
additional QTL. 


11.3.1 Meta-QTL (MQTL) Analysis 


The results of a total of 18 QTL analyses and 
genome-wide association studies (GWAS) con- 
ducted on flowering time in bread wheat were 
utilized and aligned to the IWGSC-CHINESE 
SPRING reference sequence (IWGSC RefSeq 
v1.0) to identify genetic hot spots or MQTL. 
The studies consisted of 17 mapping popula- 
tions and five GWAS panels. The traits included 
were days to heading, days to anthesis, days to 
maturity, and earliness per se (Supplementary 
Table S11.1). In addition, 24 flowering genes 
with known physical locations were integrated 
(Supplementary Table S11.2). We projected 
201 flowering time QTL with 120, 27, 25, 
and 29 QTL related to days to heading, anthe- 
sis, maturity, and earliness per se, respectively 
(Fig. 11.5). QTL were projected on all chro- 
mosomes. The number of projected QTL per 
genome was 95 (47.3%), 71 (35.3%), and 59 


(29.496) for A, B, and D genomes, respectively. 
The number of QTL per chromosome ranged 
from 3 QTL on chromosome 1A to 50 QTL on 
chromosome 5A. 

A window size of 30 Mb was used to infer 
MQTL. Seven MQTL for flowering time were 
detected that ranged from 10.5 Mb to 28.7 Mb 
across chromosomes (Table 11.1). On chromo- 
some 1D, a MQTLI was identified between 
477.9 and 495.1 Mb (MQTLI) and has a size of 
17.2 Mb. MQTL2 was located on chromosome 
2A between the physical positions of 28.2- 
43.2 Mb and with a size of 15.0 Mb. On chro- 
mosome 2B, MQTL3 was located between 33.9 
and 62.6 Mb and has the largest size of 28.7 Mb. 
MQTL4 was located between physical positions 
of 30.8-49.0 Mb on chromosome 3B. MQTLS5 
and MQTL6 were detected on chromosomes 
5A and 5B with sizes of 13.8 and 15.1 Mb, 
respectively. MQTL5 had the maximum num- 
ber of QTL (36) followed by MQTL6 (14). 
The MQTL7 was located on chromosome 6A 
between the physical positions of 67.0—77.5 Mb 
and had a size of 10.5 Mb. 

The MQTL provides the advantage of read- 
ily separating QTL which are environmentally 
more stable, so might have relevance in many 
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Table 11.1 Summary of flowering time meta-QTL positioned on wheat reference genome IWGSC RefSeq v1.0 


MQTL Chromosome Range of refer- 
ence genome V 
1.0 (Mb) 
MQTLI 1D 477.9-495.1 
MQTL2 2A 28.2-43.2 
MQTL3 2B 33.9-62.6 
MQTLA 3B 30.8-49.0 
MQTLS 5A 581.1—594.9 
MQTL6 5B 571.1-586.3 
MQTL7 6A 67.0-77.5 


different locations globally from those which 
infrequently occur in QTL analyses. Using this 
distinction, marker and candidate gene identifi- 
cation can be targeted for specific environmental 
conditions and so enable the development of a 
deeper understanding and application of flower- 
ing time regulation. 

The most frequently identified QTL identi- 
fied in the MQTL analysis were located on chro- 
mosome 5 (A and B genomes) and associated 
with the VRNI region, along with the closely 
associated PHYC gene. A third very robust QTL 
region was identified on chromosome 1D, over- 
lapping with the EARLY FLOWERING 3 (ELF3) 
gene, and containing 7 QTL. Additionally, two 
regions were identified on chromosomes 3B 
and 6A where QTL were detected in multiple 
analyses and do not yet have a gene associated 
with them. Both chromosome regions on 3B 
and 6A are interesting targets for further inves- 
tigation. Several QTL were further identified in, 
potentially, homoeologous regions. These may 
indicate that the same gene on homoeologous 
chromosomes contributes to the regulation on 
flowering time and, therefore, may represent a 
stable locus but with dosage effect, commonly 
seen in wheat. Examples for these QTL are in 
the proximal region of chromosome 5 and the 
distal region on chromosomes 5, 6, and 7. 


Size (Mb) Number of 
QTL 
Candidate gene 
17.2 T TaELF3-1D 
15.0 3 Ppd1-2A 
28.7 3 Ppd1-2B 
18.2 4 
13.8 36 TaPHYC-SA, 
Vrn-Al 
(TEST 14 TaPHYC-5B, 
Vrn-Bl 
10.5 4 


11.4 The Effect of Major 
Genes on the Response 
to Vernalization 
and Photoperiod 
to Developmental Phases 
and Traits 


As an essential trait, the mistiming of flowering can 
ultimately lead to partial or complete crop failure. 
However, the focus on time to flowering has meant 
that additional pleiotropic effects are also selected 
for, some of which are beneficial. The dominant 
regulator of vernalization, VRN1, is an important 
gene for the control of the vernalization response 
and also for the formation of the flower itself, high- 
lighted by its homology to the Arabidopsis AP7 
gene (Yan et al. 2003). VRNI, in combination with 
its homologues FUL2 and FUL3, contribute to the 
regulation of spikelet formation, plant height, and 
tiller progression (Li et al. 2019). Furthermore, the 
regulatory roles of VRN/ are not limited to floral 
regulation. The growth of spring vs. winter near- 
isogenic lines for VRNI in barley identified that 
other traits including root density at specific soil 
depths were affected (Voss-Fels etal. 2018). In 
spring barley near-isogenic lines (NILs), root den- 
sity during grain filling was increased at soil depths 
between 20 and 60 cm, compared to winter NILs 
(Voss-Fels et al. 2018). 
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Like VRNI, the regulator of photoperiod 
response, PPD1, is also linked to a number of 
additional phenotypes. Some of these are closely 
related to flowering time, for example, the rate 
of spikelet initiation is accelerated in PPDI- 
insensitive NILs, leading to a reduction in the 
number of spikelets per spike (Ochagavia et al. 
2018). Likewise, the formation of additional or 
paired spikelets is also altered depending on the 
PPDI allele, a mechanism which is believed to 
be regulated through the strength of the FT7 sig- 
nal (Boden et al. 2015). Beyond the spike archi- 
tecture, PPD/ influences grain filling and dry 
mass production. In durum wheat (Triticum tur- 
gidum L. var durum), cultivars carrying PPD1 
alleles which conferred photoperiod insensitivity 
allowed earlier flowering and more robust grain 
filling, leading to enhanced yields. This correla- 
tion of effects may not be due to a direct effect 
of PPD1 regulating these processes, but might 
be due to PPD/ enabling optimal timing of 
flowering for the particular environment (Royo 
et al. 2016, 2018; Arjona et al. 2020). 

Both the photoperiod and vernalization path- 
ways are integrated through the cereal FT/-like 
gene. As such, allelic variation of FT] unsur- 
prisingly shows variation in spikelet number, 
potentially linked with spikelet initiation rate. 
The link with spikelet initiation is supported 
as transgenic lines over-expressing TaFTI rap- 
idly flower, while still on the callose regenera- 
tion media and produce a spike with only a few, 
infertile spikelets (Lv et al. 2014). In addition to 
FTI, cereals contain a vastly expanded family of 
FT-like genes, which are becoming a focus for 
characterization (Bennett and Dixon 2021). F72 
has been linked with spikelet initiation (Gauley 
and Boden 2021), while HvFT3 has also been 
associated with spikelet initiation in spring lines, 
independent of a photoperiod signal (Mulki et al. 
2018). Interestingly, while FT3 showed a role 
in photoperiod-independent spikelet initiation, 
plants were unable to complete floral develop- 
ment under short-day conditions, indicating 
that FT3 alone cannot promote floral develop- 
ment (Mulki et al. 2018). Yet, in winter barley, 
over-expression of FT3 could trigger the expres- 
sion of HvVRN1 and enable floral development 
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in non-vernalized plants (Mulki et al. 2018). In 
contrast to this, FT4 has been identified to func- 
tion as a repressor of spikelet initiation in bar- 
ley, with over-expression of HvFT4 leading to 
a reduction in spikelet primordia and ultimately 
grains per spike (Pieper et al. 2021). 


11.5 Extending Genetics 
to Prediction 


Flowering time is a critical consideration in the 
adaptation of wheat to scenarios of changing 
environments. Future adaptation of any crop in 
their major producing countries must be fore- 
cast because of the substantial time lag in the 
planning, breeding, and release of new cultivars 
which can take between 6 and 10 years (Tanaka 
etal. 2015; Hammer etal. 2020). The deliv- 
ery of climate-smart solutions for cultivars to 
be released in a time-reduced and cost-effective 
manner is a daunting challenge for current agri- 
cultural research (Ramirez-Villegas et al. 2020). 
Based on diverse climate change models, wheat 
yields will suffer climate change-related declines 
below current production rates in most regions, 
with the most negative impact projected to affect 
developing countries in warmer regions (Pequeno 
et al. 2021). For example, in a modeling study by 
Asseng etal. (2015), a decrease in wheat yield 
gain, namely a fall of 6% yield for each 1°C 
rise in temperature was predicted, with resultant 
uncertainty in production over space and time. 
More recently, Demirhan (2020) estimated a 90.4 
million ton drop in global wheat production with 
a 1 °C warming of surface temperature, but a 32.2 
million ton increase in production associated with 
1 ppm increase in CO, emissions. This empha- 
sizes the complexity of climate change and its 
relationship with vital processes in nature. 

To mitigate future uncertainties and to reduce 
the negative environmental impacts, explora- 
tory simulation models or so-called “adapta- 
tion pathways" can be developed (Tanaka et al. 
2015). Optimum flowering periods, defined by 
maximum grain yield potential, are explored by 
simulating interactions of genotype x environ- 
ment x management (G x E x M) under current 
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and future climates for major crops including 
wheat (Pequeno et al. 2021; Zheng et al. 2012; 
Flohr et al. 2017; Chen et al. 2020). Thus, sta- 
tistical and mechanistic models that enable pre- 
diction of the performance of plants cultivated 
In various environmental conditions will play a 
crucial role in breeding for environmental adapt- 
ability and optimization of crop management. 


11.5.1 Finding Conceptional Ideotypes 
for Given Environments 


Diagnostic molecular markers associated with 
the important regulatory genes and QTL related 
to wheat adaptation (as summarized above) pro- 
vide a method for identifying existing allelic 
variation and estimating the effects of each of 
the alleles in a diverse target production environ- 
ment. The estimated allele effects can be used to 
conceptualize ideotypes or genotypes that fit to a 
specific flowering time range or can predict out- 
comes of specific crosses in breeding. 

Allelic variation in vernalization genes does 
not contribute to large differences in flower- 
ing time in environments where vernalization 
saturation occurs. A large worldwide panel of 
varieties was evaluated by Wiirschum et al. and 
revealed that a three-component system facili- 
tated the adoption of heading date in winter 
wheat (Würschum etal. 2018). The PPD-DI 
locus was found to account for almost half of 
the genetic variance (the photoperiod-insensitive 
allele PPD-Dla mainly present in eastern and 
southern Europeans as well as in Eurasian cul- 
tivars), followed by copy-number variation at 
PPD-Bl. Further fine-tuning to local climatic 
conditions was attributed to small-effect QTL. 
Sheehan and Bentley (2020) recently docu- 
mented a dialog with UK wheat agronomists, 
outlining the requirement of greater flexibility of 
varietal flowering time (preferably earlier flower- 
ing genotypes) in UK winter wheat to find ideo- 
types for expected changing seasonal conditions, 
and increasing seasonal weather fluctuations. 

In spring wheat, Cane et al. (2013) attempted 
to define a conceptual genotype or ideo- 
type for environments in southern Australia 
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characterized by variable rainfall in late autumn 
and early winter. The authors suggested a 
spring cultivar with slowed development from 
early sowing, followed by rapid development 
with increasing temperature and daylength, 
was an optimal type. The authors defined 
the allele combination (1) PPD-BI (3-copy 
variant) + PPD-DIa-* VRN-A1w (WICHITA 
allele) + VRN-BI+VRN-Dla or (2) PPD- 
Bl  (3-copy variant) - PPD-DIb-- VRN-AIw 
(WICHITA allele) -VRN-BI - VRN-Dla as 
most suitable. Overall, the variability present in 
modern Australian spring wheat cultivars was 
high and diverse combinations of alleles had 
been successful in the past and were widely 
grown (Eagles etal. 2009). Recently, Christy 
et al. (2020) developed a photoperiod-corrected 
thermal model that solely utilized the combina- 
tion of PPD and VRN alleles to predict wheat 
phenology to identify the phenological suitabil- 
ity of germplasm across the cropping region in 
southern Australia. Similar to Cane et al. (2013), 
the authors used their model to identify the opti- 
mum allelic combinations required to target 
optimum flowering period for different locations 
when sown on different dates. By comparing a 
series of NILs with different major allele com- 
binations and diverse phenology in the field, 
Bloomfield et al. (2019) however revealed that 
a model parameterized solely using multi-locus 
genotypes is not accurate enough to predict the 
adoption to flowering time under field condi- 
tions. For more accurate predictions, the authors 
suggested quantifying minor genetic drivers and 
including genotype x environment (G x E) inter- 
action into models based on genetically derived 
parameter estimates. 

In breeding programs, the major VRN and 
PPD loci are usually quickly fixed when tar- 
geted at a specific selection environment. In 
widely adapted CIMMYT spring bread wheat, 
bred mainly in Mexico but globally distributed 
through international nurseries and yield trials, 
the two spring alleles VRN-Bla, VRN-Dla and 
the PPD-Dla-insensitive allele are the most 
frequent (Van Beem etal. 2005; Dreisigacker 
et al. 2021a, b). Also apparent is a strong selec- 
tion pressure against the spring allele, VRN-A Ja, 
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which results in a strong negative effect on 
the accumulation of biomass and yield at the 
CIMMYT main selection site at CENEB, in 
North Mexico, suggesting that genotypes with 
some vernalization sensitivity are better adapted. 
Greater allelic variation was found at the PPD- 
Al, PPD-B1 (including copy number variants), 
and VRN-D3 loci. Further, alleles at the two 
more recently identified photoperiod genes, 
TaTOE-BI and TaFT-B3, positively promoted 
harvest index and yield (Dreisigacker et al. 
202 1a, b). 


11.5.2 Genomic Prediction 


With the swift development of next-generation 
sequencing technologies, whole-genome marker 
information is generated for all types of germ- 
plasm sets. Instead of using only several major 
loci, genomic prediction/selection aims to uti- 
lize whole-genome marker information to pre- 
dict plant phenotypes (Meuwissen et al. 2001) 
and thus also includes minor genetic drivers of a 
trait. While the approach was initially proposed 
in animal breeding, studies on genomic predic- 
tion have been growing in crops including wheat 
and have become a practical tool in breeding 
(de Los Campos et al. 2009; Crossa et al. 2010, 
2014; Dreisigacker et al. 2021a, b). Flowering 
time as an important agronomic trait has been 
predicted with genome-wide markers in wheat 
using different training and target populations. 
Within-environment and within single popula- 
tions’ genomic prediction accuracies for flow- 
ering time or heading date, measured as the 
correlation between genomic estimated breeding 
values and the observed traits, are in the range 
of 0.4 and 0.7 in the published literature guided 
by heritability (Charmet et al. 2014; Zhao et al. 
2014; Liu et al. 2020; Haile et al. 2021; Crossa 
et al. 2016). 

Predicting the performance of plant phe- 
notypes across diverse environments is more 
difficult compared to  within-environments 
because phenotypes of more complex traits 
are often influenced by G*xE interaction. 
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Multi-environment trials (METs) for assess- 
ing the G x E interaction are therefore com- 
mon practice in plant breeding for selecting 
high-performing,  well-adapted lines across 
environments. Models have been developed 
that evaluate G x E interaction in genomic pre- 
diction. Burguefio et al. (2012) were the first to 
use marker and pedigree genomic best linear 
unbiased prediction (GBLUP) models to assess 
G x E. Jarquín etal. (2014) proposed a reac- 
tion norm model where the main and interaction 
effects of markers and environmental covariates 
are introduced using highly dimensional random 
variance-covariance structures of markers and 
environmental covariables. A marker x environ- 
ment (M x E) interaction model was proposed 
by Lopez-Cruz etal. (2015) and decomposed 
the marker effects into components that are 
common across environments (stability) and 
environment-specific deviations (interaction). 
Genomic prediction models that incorpo- 
rate Gx E or MxE interaction have shown 
to increase prediction accuracies by 10-40% 
with respect to within-environment analyses 
(Dreisigacker et al. 2021a, b; Crossa et al. 2017; 
Pérez-Rodríguez et al. 2017). 

Another way to improve the prediction accu- 
racy of G x E is to introduce secondary traits 
measured in each environment on both the train- 
ing and target populations in multi-trait genomic 
prediction models. Recently, Guo et al. (2020) 
used days to heading as a fixed effect in a multi- 
trait model with additional yield components in a 
panel of USA soft facultative wheat. The multi- 
trait predictions demonstrated higher predic- 
tive accuracy than the single-trait models under 
a multiple-environmental analysis showing its 
capacity to predict the performance of a geno- 
type for different target environments. Similarly, 
Gill et al. (2021) used multi-trait, multi-environ- 
ment genomic prediction which performed best 
for all agronomic traits in their study including 
days to heading. Other studies introduce environ- 
mental covariates in genomic prediction models 
to predict the performance of lines in new envi- 
ronments (Jarquín et al. 2014; Heslot et al. 2014; 
Malosetti et al. 2016; Ly et al. 2018). 
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11.5.3 Integrating Crop Modeling 
with Genome-Based Prediction, 
Phenomics, and Environments 


Modular crop model development approaches 
(Jones etal. 2001) and the rapid advance of 
QTL analyses conducted for a vast number of 
populations under diverse environments have 
opened up opportunities to integrate these 
two methods (see Box 11.1). This integration 
allowed the addition, modification, and main- 
tenance of new components, including more 
recently gene-based functions into process- 
based crop models (Hoogenboom et al. 2004; 
White 2009; Zheng etal. 2013; Chenu etal. 
2018; Hammer et al. 2019; Robert et al. 2020; 
Tardieu etal. 2021; Oliveira etal. 2011; Hu 
etal. 2021; Boote etal. 2021; Cooper etal. 
2021; Potgieter et al. 2021; Cowling et al. 2020; 
Wallach etal. 2018; Hwang etal. 2017; Yin 
et al. 2018). The first simple gene-based model 
was developed by White and Hoogenboom 
(1996) linking gene information with genotype- 
specific parameters (GSPs) for a drybean model 
called BEANGRO (Hoogenboom et al. 2019), 
where seven genes were used to estimate 19 
parameters simulating data for 32 cultivars. 


Box 11.1 Crop models simulating flowering 
time 

Most crop models use similar approaches 
to simulate the crop life cycle, integrat- 
ing development rate over time, usually 
assuming a potential development rate 
driven by temperature and modified by 
several other factors such as photoperiod, 
vernalization, and other abiotic stresses 
that may accelerate or delay crop devel- 
opment (Oliveira et al. 2011). The rate of 
development used in many crop models 
is a function of a triangular or trapezoi- 
dal shape driven by time (TT), or grow- 
ing degree days (GDD), that are calculated 
based on maximum and minimum air tem- 
perature. The temperature response for 
wheat has a base temperaure (below which 
no development occurs) of approximately 
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0 °C, optimal temperature (maximum 
development rate) of approximately 26 °C, 
and a maximum temperature (above which 
no development occurs) of approximately 
34 °C (Hu et al. 2021; Boote et al. 2021). 
These air temperature thresholds and cal- 
culations could vary depending on the crop 
model used, as some research articles have 
shown that base and optimal temperature 
could change during the wheat life cycle, 
besides soil mean crown temperature being 
adjusted by snow depth (Hoogenboom 
et al. 2004; Boote et al. 2021). 

The day length effect on crop develop- 
ment is accounted for by a photoperiod 
sensitivity factor which results in a daily 
percent reduction of development rate, 
below the threshold of 20 hours of day- 
length. The vernalization effect is computed 
as a function of a vernalization sensitivity 
factor, or maximum development rate to 
reach the threshold number of accumulated 
vernalization days required for a specific 
cultivar. Vernalization is also lost when 
daily maximum temperature is above 30 
°C. Vernalization and photoperiod factors 
are used to modify accumulation of ther- 
mal time from emergence to floral initiation 
(Hoogenboom et al. 2004; Hu et al. 2021; 
Boote et al. 2021; Cooper et al. 2021). 

The process-based modelling approaches 
mentioned above have been used to predict 
development of wheat and many crops with 
good accuracy across many years, having 
as input other genotype-specific parameters 
(GSPs) besides weather, soil, and crop man- 
agement variables (Potgieter etal. 2021). 
However, only recently have these models 
started to incorporate true genetic informa- 
tion to capture differences among cultivars 
instead of empirical GSPs created and cali- 
brated based on processes and observations 
from field and laboratory studies (Boote 
et al. 2021; Cowling et al. 2020) even though 
the idea and first studies started in the late 
1990s (Wallach etal. 2018; Hwang etal. 
2017; Yin et al. 2018). 
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Fig. 11.6 Schematic representation of the integrated 
model. The crop must pass through each of the phases 
along the x-axis to reach anthesis. Temperature per se 
controls the progression through each phase in combi- 
nation with the factors presented. Temperature and pho- 
toperiod control the expression of VRNI, VRN2, VRN3, 
and VRN4 genes as demonstrated by the scheme within 


Since then, there has been a rapid increase in 
the number of research studies including gene- 
based modeling applications, but most of them 
are still limited to crop phenology and other less 
complex traits. Brown et al. (2013) integrated 
molecular and physiological models to simulate 
time to anthesis using lines of spring and win- 
ter wheat under different temperature and pho- 
toperiod conditions. They linked the duration of 
phases to expressions of VRN genes to account 
for the effects of temperature during each devel- 
opmental stage to develop a model (Fig. 11.6). 
This analysis framework was compared with 
CERES, ARCWHEATI, and SIRIUS model 
approaches, suggesting the possibility of linking 
phenological parameters and anthesis time to the 
alleles or copy number of genes that control the 
expression of protein signals, relating anthesis 
genotype to phenotype. 

Hu et al. (2021) used the APSIM wheat-G 
gene-based phenology model to identify the 
optimal flowering period of spring wheat and 


the pentagon (pointed arrows show promotion and flat 
arrows show repression) and subsequent amount of 
[Vrn1], [Vrn2], [Vrn3], and [Vrn4] protein expressed as 
demonstrated by the lines on the graph. The amount of 
these proteins controls the timing of vernalization and 
terminal spikelet (adapted from Brown et al. 2013) 


concluded that this type of model can identify 
the best combination of sowing dates and time 
to flowering to minimize frost and heat risk and 
achieve higher yields. Among the gene-based 
modeling applications, those that can be inte- 
grated with several other breeding tools have the 
greatest potential. Wang et al. (2019) reviewed 
necessary improvements for process-based crop 
models to simulate G x Ex M interactions and 
stated that the verification of temporal gene 
expression profiles, their environmental depend- 
encies, and their expression levels are further 
required to trigger key phenological stages. 

A growing body of research focuses on the 
benefits and challenges resulting from the inte- 
gration of several modern technologies into 
breeding programs. This includes genomics 
using dense molecular markers, detailed trait 
analysis using advances in phenomics, image 
analyses, and the intense used of environmen- 
tal covariables (environomics) and multi-trait 
analysis in order to accelerate genetic gains and 
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increase agricultural production (Crossa etal. 
2019). Incorporating these newly available tech- 
nologies, e.g., computer simulation for genomic- 
assisted rapid cycle population improvement, 
combining rapid genomic cycling with speed 
breeding, high-throughput phenotyping, and 
using historical climate and soil data, has poten- 
tial to improve conventional breeding schemes. 
Integrating the machinery of crop modeling 
with that of genomic information and phenom- 
ics data together with environomics platforms 
can further increase the breeding efficiency. This 
in turn offers great promise to develop varieties 
rapidly since the selection of candidate individu- 
als can be performed with higher accuracy. 

There is evidence that crop models are useful 
for phenotypic prediction of relevant quantita- 
tive traits by simulating the behavior and growth 
of crops using solar radiation, water, nitrogen, 
etc., as input. Still, there is little empirical evi- 
dence that integration of this type of model with 
whole-genome prediction increases the predic- 
tion accuracy of unobserved cultivars. Two sim- 
ulation studies (Technow etal. 2015; Messina 
et al. 2018) showed that integration to a com- 
bined model improved prediction accuracy rela- 
tive to the genomic model alone. 

Grain yield is the ultimate measure of crop 
adaptation due to phenology. Crop models can 
also be used for prediction of complex traits 
such as grain yield for different cultivars and 
location-year combinations within certain eco- 
geographical regions. It is necessary to incor- 
porate the genetic variance of the traits and 
how these will change under different environ- 
mental conditions into the models. With the 
rapidly increasing availability of data on DNA 
sequences of individual cultivars or breed- 
ing lines, the use of crop models to improve 
crop model development and applications has 
been significantly fast. Similarly, advances in 
the understanding of the control of plant pro- 
cesses at the molecular level offer opportunities 
to strengthen how certain plant physiological 
mechanisms are incorporated into crop models. 

It has been shown that crop models can be 
integrated with genomic prediction to enhance 
prediction accuracy using simulation data. 
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For example, Heslot etal. (2014) employed 
crop models to derive stress covariates from 
daily weather data for predicted crop develop- 
ment stages, by means of the factorial regres- 
sion model to genomic selection modeling of 
QTL x environment interaction on a genome- 
wide scale. The method was tested using a win- 
ter wheat dataset, and accuracy in predicting 
genotype performance in unobserved environ- 
ments for which weather data were available 
increased by 11.1% on average. Furthermore, 
Cooper etal. (2016) used crop models with 
genomic-enabled prediction applied to an empir- 
ical maize drought data set. These authors found 
positive prediction accuracy for hybrid grain 
yield in two drought environments. 

In general, crop models have been used 
for crop management decision support. The 
presence of Gx Ex M interactions for yield 
presents challenges for the development of pre- 
diction technologies for product development 
by breeding and product placement for different 
agricultural production systems. Messina et al. 
(2018) combined simulation and empirical stud- 
ies to show how to use CGM with genome-ena- 
bled methodology for the application to maize 
breeding and product placement recommenda- 
tion in the US corn-belt. 

In plant breeding, genetic and environmen- 
tal factors can interact in complex ways giving 
rise to substantial G x E interactions that can 
be used to select genotypes adapted to specific 
environments. Nevertheless, accurate predic- 
tions of future performances in environments 
are challenging and it requires consideration 
of the possible weather conditions that may 
occur within a region and how individual geno- 
types are expected to react to those conditions. 
Usually, METs occurring over many years and 
across multiple locations are utilized to facilitate 
such predictions. The major challenge is that 
MET is organized over few years and locations 
such that genotypes are often advanced without 
being tested under weather conditions that may 
critically affect their performance. To overcome 
this limited scope of the MET, de los Campos 
etal. (2020) proposed data-driven computer 
simulations that integrate field trial data, DNA 
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sequences, and historical weather records for 
predicting genotype performances and stability 
using limited years of field testing per genotype. 

The data-driven simulation proposed by de 
los Campos et al. (2020) links modern genomic 
models that integrate DNA sequences (e.g., 
single nucleotide polymorphisms—SNPs) and 
environmental covariates (EC; Jarquin et al. 
2014; Crossa et al. 2019) by means of Monte 
Carlo methods that integrate uncertainty about 
future weather conditions as well as model 
parameters (characterized using the their pos- 
terior distribution). The importance of this 
approach is to study ECs as a mechanism to 
characterize the environmental conditions pre- 
vailing during crop growing seasons on the cur- 
rent MET location-year but also in the past field 
trial data with historical (or simulated) weather 
records that describe environmental conditions 
that are likely to occur in a location or region. 
The results of de los Campos etal. (2020) 
results show that (1) it is possible to predict the 
performance of cultivar at environments where 
these cultivars have few (or none) testing data 
and (2) predictions that incorporate historical 
weather records are more robust with respect 
to year-to-year variation in environmental con- 
ditions than the ones that can be derived using 
only few field trials. 

Further research is needed to add evidence 
that crop modeling together with genomic-ena- 
bled predictions can be of benefit in plant breed- 
ing together with phenomics and environomics. 
Three proposed directions for future research 
are: (a) to use historical data to complement 
the advantages of crop modeling with those of 
genomics and phenomics; (b) to conduct more 
simulation studies with different type of crop 
models, genomics, and phenomics models and 
(c) to conduct real experiments where the sci- 
entist can control the input of the crop model 
and measure as accurate as possible the out- 
put. Simulation studies should be conducted to 
benchmark the prediction performance of com- 
bined models (crop model+genomics) com- 
pared to stand-alone genomic prediction models. 
Comparing combinations of different types of 
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crop and genomics models, which include ran- 
dom effects for G x E interaction terms, would 
be useful. New deep learning models that have 
been developed for dealing with big data sets 
should also be considered for incorporation with 
crop models for multi-trait, multi-environment 
predictions (Montesinos-Lopez etal. 2018, 
2019). 


11.6 Future Opportunities 


Since its first cultivation in 7000 BC, hexaploid 
wheat has evolved and adapted, enabling expan- 
sion underpinning global food security. Adaptive 
genes (and their complex interactions) have 
played an important role in optimizing wheat 
production and will continue to play a significant 
role in fine-tuning flowering and reproductive 
cycles suited to changing climates and evolving 
agricultural production systems. As documented 
in this chapter, many QTL have been detected 
with robust effects within and across environ- 
ments which have expanded the breadth of adap- 
tive variation to be explored in future. However, 
additional work is required to identify underly- 
ing genes and dissect pathways to understand 
their mode of action and accelerate their vali- 
dation and deployment in breeding. Likewise, 
MQTL can help to identify relevant genomic 
regions over space and time and facilitate the 
identification of new candidate genes. In the 
QTL comparison conducted here, we detected 
seven MQTL regions on chromosomes 1D, 2A, 
2B, 3B, 5A, 5B, and 6A. While five MQTL were 
co-located with known flowering genes regions, 
candidate genes for two MQTL are not yet 
known. The identification of genes underpinning 
the two robust MQTL regions on chromosomes 
3B and óA and those identified to be in homoe- 
ologous regions (proximal on 5 and distal on 
chromosomes 5, 6, and 7) will offer new poten- 
tial targets for exploitation. These genetic dissec- 
tion efforts will be greatly aided by current and 
future developments in wheat genome sequenc- 
ing and characterization of haplotypes across the 
wheat and progenitor pangenomes. 
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Moving beyond the identification of flower- 
ing time loci, it will also become increasingly 
important to understand how genetic regions 
influence other developmental traits and their 
responses to environmental factors. This depth 
of understanding will allow a more targeted 
“design” of adaptive ideotypes to suit current 
and future climates and is likely to influence the 
use of novel breeding methods. For example, the 
timing of flowering and regulation of distinct 
flowering stages may influence the efficiency 
of hybrid wheat seed production, supporting the 
development of mainstream hybrids. Similarly, 
genomic selection network approaches that 
can include multiple traits along with flower- 
ing time are likely to be useful in identifying 
high-performing, optimally adapted lines for 
breeding, selection, and release. Finally, much 
future potential exists in applying recently 
developed integrated genomics and crop mod- 
eling approaches. The advances with gene-based 
modeling in the future, if successful, should 
make it possible to describe growth and devel- 
opment processes with QTL and other genomic 
loci analysis, integrated in process-based crop 
models in a modular approach. This would 
potentially reduce the need for crop modeling 
calibration using phenotypic data after new 
cultivars are released to assess their response 
to genotype, environment, and management 
(Gx Ex M) conditions. This can both lever- 
age extensive historical data (available in many 
breeding programs) to identify previously hid- 
den environmental “clues” as well as providing 
novel targets for the design and deployment of 
further climate change adaptation strategies. 
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and end-use traits. Since wheat cultivars and 
landraces have been explored extensively to 
identify novel genes and alleles, one way to 
overcome these pitfalls is by looking into the 
proverbial treasure trove of genomic diver- 
sity that is present in wheat’s wild relatives. 
These wild relatives hold reservoirs of genes 
that can confer broad-spectrum resistance to 
pathogens, increase yield, provide additional 
nutrition, and improve dough quality. Genetic 
approaches and techniques have existed to 
introgress wild chromatin to bread wheat, 
as well as trace introgressions present in 
the germplasm for over 7 decades. However 
with the availability of NGS technologies, it 
is now easier to detect and efficiently inte- 
grate the genetic diversity that lies within 
wheat’s gene pools into breeding programs 
and research. This chapter provides a con- 
cise explanation of current technologies that 
have allowed for the progression of genomic 
research into wheat’s primary, secondary, and 
tertiary gene pools, as well as past technolo- 
gies that are still in use today. Furthermore, 
we explore resources that are publicly avail- 
able that allow for insight into genes and 
genomes of wheat and its wild relatives, and 
the application and execution of these genes 
in research and breeding. This chapter will 
give an up-to-date summary of information 
related with genomic resources and reference 
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assemblies available for wheat's wild rela- 
tives and their applications in wheat breeding 
and genetics. 
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12.1 Introduction 

Bread wheat is one of the most important sta- 
ple crops and provides over 1/5th of the calories 
consumed by the world’s population (FAOSTAT 
2020). Global wheat production needs to be 
increased in light of the growing human popula- 
tion and changing climatic conditions (Hickey 
et al. 2019; Ray et al. 2012, 2013; Tilman et al. 
2011). To cope with the numerous challenges 
that wheat faces, such as heat, drought, and dis- 
eases, it is important to find useful sources of 
genes and alleles for its improvement, and at 
the same time, develop approaches for efficient 
transfer of this useful genetic variability to cul- 
tivated wheat. Efforts have already been made 
in this direction, with the major and success- 
ful efforts that have been made after the wheat 
genome reference assembly using T: aestivum cv. 
CHINESE SPRING became available as a model 
in the 2018 (Appels etal. 2018). Since then, 
more and more resources have been added up to 
speed up breeding activities and the development 
of markers for important traits. For instance, in 
the years 2019 and 2020 a wheat pan-genome 
resource containing an assembly of 10+ wheat 
genomes including elite cultivars from across the 
globe and a 1 K exome capture data were gener- 
ated (He et al. 2019; Walkowiak et al. 2020). In 
fact, although high-quality reference assembly 
1s available for CHINESE SPRING, it does not 
capture the complete species-specific variation 
that can be exploited for variety development. 
Therefore, the above genomic resources includ- 
ing the pan-genome and exome capture data 
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have proven to be highly useful. These resources 
have also been exploited for identification of 
useful wild introgressions in wheat followed 
by marker development for biotic and abiotic 
stress tolerance traits. The current pan-genome 
resource consists of ten genomes with pseu- 
domolecules level assembly and five genomes 
with assemblies of hexaploid wheat. 

One of the major objectives of any breed- 
ing program has been to develop resilient wheat 
varieties against environmental conditions as 
well as biotic stresses and significant progress 
has been made in the genetic improvement of 
wheat, mainly after the green revolution either 
using conventional or molecular breeding 
approaches through marker assisted selection. 
The introduction of dwarfing genes during the 
green revolution revolutionized wheat variety 
development and led to dramatic increase in 
wheat yield across the globe (Ali etal. 1973; 
Hedden 2003; Pingali 2012). Similarly, impor- 
tant genetic markers have also been identified 
for the QTL/genes providing resistance against 
different biotic and abiotic stresses (Saini et al. 
2022; Singh et al. 2021). This has certainly led 
to the enhancement in the breeding populations 
of wheat; however, at the same time it has also 
narrowed down the genetic base thus resulting 
in reduced species variability. This ultimately 
necessitates the need to explore the wild and 
related species of wheat which are an important 
reservoir of useful genetic diversity as well as 
genes for biotic and abiotic stresses. 

Based on the evolutionary distance between 
the species and the success rate of interspecies 
hybridization, Harlan and de Wet (1971) intro- 
duced the idea of wheat gene pools that included 
primary, secondary, and tertiary gene pool 
(Fig. 12.1) (Jiang et al. 1993; Mujeeb-Kazi et al. 
2013). While, the genomes of primary and sec- 
ondary gene pool share some homology with the 
wheat genome, the species in the tertiary gene 
pool do not share any homology with the wheat 
genome and, therefore, are sexually incompat- 
ible through homologous recombination. It is 
also difficult to cross the species of secondary 
and tertiary gene pool with hexaploid wheat 
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Fig. 12.1 Overview of bread wheat's gene pools with examples in each category 


when compared to the species of primary gene 
pool (Mujeeb-Kazi et al. 2013). 

The species in the primary gene pool include 
modern wheat cultivars and other T. aesti- 
vum landraces, Triticum spelta (AABBDD), 
tetraploid durum wheat T. turgidum (AABB), 
diploid wheat species T. urartu (AA), and 
Aegilops tauschii (DD). Examples of spe- 
cies in the secondary gene pool are tetraploid 
species T. timopheevii (AAGG), and diploid 
species T. monococcum (AmAm) and Ae. spel- 
toides (SS). Species in the tertiary gene pool 
include cultivated species such as rye (RR) and 
barley (HH) as well as wild relatives of wheat. 
Importantly, wild relatives of wheat contain a 
treasure trove of variability that can overcome 
the genetic bottlenecks found in bread wheat 
(Tiwari et al. 2015). Examples of these are wild 
grasses such as diploid Thinopyrum elongantum 
(EE), tetraploid Ae. geniculata (UUMM), and 
octoploid Leymus arenarius (XXXXNNNN) 
(Pour-Aboughadareh etal. 2021; Anamthawat- 
Jónsson 2001). Due to the absence of pairing at 
meiosis between the tertiary pool chromosomes 
and those of wheat, techniques such as radia- 
tion induced chromosomal breaks or gene edit- 
ing must be used to create introgression lines 


(Benlioglu and Adak 2019; Jiang etal. 1993; 
Mujeeb-Kazi et al. 2013). 

As mentioned above, the availability of 
genomic resources in hexaploid bread wheat has 
driven the development of useful markers lead- 
ing to stress resilient wheat cultivars. However, 
looking at the complexity of the wheat genome 
owing to its large genome size and polyploid 
nature, it became necessary to develop genomic 
resources for the above wild relatives of wheat. 
Considerable progress has already been in this 
direction. For example, diploid relatives Ae. 
longissima, Ae. speltoides, and Ae. sharonensis, 
as well as several accessions of Ae. tauschii all 
have recently released reference quality assem- 
blies available for BLAST and genome brows- 
ing (Avni et al. 2022; Gaurav et al. 2022; Zhou 
etal. 2021). Further, wild tetraploid species 
T. turgidum ssp. dicoccoides v. “ZAVITAN” 
have also recently had a high-quality assembly 
released with the use of optical maps for more 
accurate scaffolding. 

The present chapter is mainly focused on pro- 
viding an overview of the available reference 
assemblies, and genomic resources in wheat’s 
wild relatives, which have been explored to 
identify useful introgressions in wheat. Some 
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examples include (i) Fhb7 (from T. elongatum) 
providing resistance against Fusarium head 
blight in wheat (Guo et al. 2015); (ii) the well- 
known IBL/IRS translocations from rye which 
has useful genes for improved grain yield 
and biomass especially under abiotic stress 
(Lukaszewski 1993), Lr57 and Yr40 from Ae. 
geniculata providing resistance against rust dis- 
ease (Kuraparthy et al. 2007a, b). Recent devel- 
opments in the next generation sequencing 
technologies have led to the development of low- 
cost sequencing reactions such as skim sequenc- 
ing which provides a useful resource for the 
identification of alien introgressions with even 
a low coverage of less than 0.1x (Adhikari et al. 
2022b). A comparative overview of synthetic 
relationships between wheat and wild relatives is 
also discussed. Overall, the present chapter will 
serve as a useful resource for the students and 
researchers working in alien wheat genomics 
and exploring useful alien wheat introgressions 
in development of wheat cultivars. 


12.2 State of Reference Assemblies 
in Wheat and Its Wild Relatives 


Wild and related species in wheat are a reser- 
voir of important genes for different abiotic and 
biotic stress tolerances. Therefore, the avail- 
ability of genomic resources for these wild rela- 
tives will prove to be an asset for identification 
of genes/QTLs and their linked markers which 
may be helpful in simplifying wheat genomics 
leading to development of elite wheat cultivars 
which is otherwise difficult due to complex and 
large wheat genome. Reference genome assem- 
blies are now available for some of the impor- 
tant wild species belonging to all the three wheat 
gene pools. Reference assemblies for the impor- 
tant wheat relatives are explained in brief below. 


12.2.1 Primary Gene Pool Reference 
Genomes 


The first draft of the reference genome of bread 
wheat first became public in 2014, utilizing 
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survey sequencing of individual chromosomes. 
Though this is considered a significant break- 
through in the world of wheat genomics, this 
initial draft sequence only accounted for ~61% 
of the entire wheat genome (Lukaszewski et al. 
2014). Four years later, with the use of addi- 
tional genetic data, including radiation hybrids, 
and sequence data, with the advancement of 
next generation sequencing (NGS) technologies, 
the fully annotated CHINESE SPRING refer- 
ence genome was released with pseudomolecule 
assemblies for all 21 chromosomes (Appels 
etal. 2018). This reference genome has been 
continuously updated with the use of new tech- 
nologies, both with the intent of more accurate 
contig establishment and scaffolding as well 
as annotation of genes not initially reported 
in the V.1.0. (Alonge etal. 2020; Zhu etal. 
2021). Extensive comparative data shows that 
CHINESE SPRING is a genetic outlier when 
compared to domesticated species of Triticum 
sp. (Walkowiak et al. 2020). 

The development of the pan-genome of wheat 
has allowed for more precise research and insight 
into the primary gene pool of wheat, includ- 
ing T. spelta. As of December 2022, 13 culti- 
vars of wheat and one cultivar of T. spelta are 
available for BLAST as well as genome brows- 
ing. Interestingly, with the information gained 
by the 10-- genome project, alien introgressions 
were able to be traced using reads derived from 
T. timopheevii and T. ponticum (JJJJJJSJSJSJ5JS) 
in T. aestivum cv. LANCER, and Ae. ventricosa 
(NYNYDYDY) in T. aestivum cv. JAGGER in order 
to get more exact coordinates of these loci. 

Tetraploid species of both cultivated (T. 
durum) and wild emmer (T. dicoccoides) wheat 
are also a part of the primary gene pool, due to 
the ability for homologous recombination to 
occur within the shared sub-genomes (A and B). 
When compared to hexaploid wheat, only 596 of 
wheat grown for human consumption is durum, 
and 95% is hexaploid. This may be attributed to 
the genome plasticity of hexaploid wheat which 
allowed for a broader potential for adaptation 
compared to tetraploid wheat (Mastrangelo and 
Cattivelli 2021). Also, compared to hexaploid 
wheat, the elite gene pool of durum wheat has 
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little genetic diversity, and most elite durum 
wheat cultivars are moderately to highly suscep- 
tible to disease resistance breeding (Clarke et al. 
2010; Miedaner and Longin 2014). This is also 
not surprising due to the widely known fact that 
hexaploid bread wheat actually evolved from an 
inter-specific hybridization between T. dicoc- 
coides and diploid species Ae. tauschii (Dvorak 
et al. 2012; Lukaszewski et al. 2014; Mcfadden 
and Sears 1946). However, it is evident from the 
published reports that wild emmer introgres- 
sions were responsible for significant gains in 
genetic diversity among the hexaploid lines as 
shown recently using the 1000 Wheat Exome 
Project (He et al. 2019). Similarly, the pheno- 
typic variance contributed by several impor- 
tant traits including harvest weight, drought 
response, and plant height is largely attributed 
to these wild emmer introgressions (Nigro et al. 
2022; Zhu et al. 2019). 

Looking into the importance of wild emmer 
introgressions in hexaploid bread wheat, 
improved reference genomes of both wild 
emmer and cultivated durum wheat were pub- 
lished in 2019. The improved reference genome 
of wild emmer wheat cv. ZAVITAN (WEW) 
utilized optical maps as well as advancements 
in alignment technologies in order to increase 
the effective size of the reference genome 
by~67 Mb, as well as adding over 2,000 
high confidence genes. Additionally, between 
WEW v1.0 and WEW v.2.0, gaps of unknown 
size dropped from 2,767 to only 471 (Avni 
etal. 2017; Zhu et al. 2019). Later in 2019, a 
high-quality reference genome of T. durum cv. 
SVEVO was published, and by utilizing the 
WEW data, it was shown that the short-term 
evolutionary changes showed little change to 
synteny between WEW and durum. There were, 
however, lower copy numbers of important gene 
families such as NLRs in SVEVO in compari- 
son with Zavitan, which implies a reduction of 
canonical R-genes (Maccaferri et al. 2019). 

Diploid progenitor species of bread wheat 
genomes A (T. urartu) and D (Ae. tauschii), 
as well as close B genome relative Ae. spel- 
toides (SS) all serve as a less complex sys- 
tem to work with for genomics research than 
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the hexaploid bread wheat (Kerby and Kuspira 
1987). Therefore, in recent years, reference 
genomes for all the three wheat genome donors 
(A genome; T: urartu, B genome: Ae. speltoides; 
D genome: Ae. tauschii) have been produced in 
order to help with wheat improvement. While 
the donors for A and D genome are included 
in the primary gene pool, the donors for the B 
genome are included in secondary gene pool. 
Therefore, the reference assemblies for the 
donors of A and D genome are discussed in 
more detail below, and the reference assemblies 
for the B genome donor (Ae. speltoides) are dis- 
cussed in separate sub-heading in the next sec- 
tion involving secondary gene pool. 


12.2.2 A Genome 


The T. urartu reference genome was first pub- 
lished in 2018 (Ling et al. 2018), four months 
before the release of the CHINESE SPRING 
v.1.0 reference genome. In their analysis, done 
using the 2014 draft wheat genome v. 0.4, strong 
structural variations were observed between 
the T. urartu A genome and the bread wheat 
A genome, proposing evolutionary rearrange- 
ments. Within the diverse population of T. urartu 
accessions used for this study, and using the 
reference genome, three distinct groups were 
identified in the Fertile Crescent. The above 
diverse accessions were screened for powdery 
mildew resistance, and excitingly, after inocu- 
lation with powdery mildew (PM), one group 
(group 2) showed significant resistance against 
the pathogen. Further, analysis using the SNP 
data revealed a single putative candidate gene 
that was involved in providing resistance against 
powdery mildew. This resistance was perhaps 
due to the natural selection for powdery mildew 
resistance as well adaptation to grow at high 
altitudes. 


12.2.3 D Genome 


The D genome progenitor Ae. tauschii is a 
well of genetic variability in wheat, due to the 
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low level of variation seen within D genome 
of wheat (Dubcovsky and Dvorak 2007; Voss- 
Fels et al. 2015). This lack of variation is par- 
tially due to the small proportion of diversity 
that was obtained during polyploidization when 
hybridization between ancient, domesticated 7. 
turgidum (A ABB), and the small population of 
Ae. tauschii near the Caspian Sea (Dubcovsky 
and Dvorak 2007; Gaurav etal. 2022; Luo 
etal. 2017; Voss-Fels etal. 2015). However, 
due to the ability to develop synthetic wheat by 
hybridizing tetraploid species with Ae. tauschii, 
diversity in the D genome can be integrated 
into the breeding germplasm (Li etal. 2018). 
The first Ae. tauschii reference genome was 
released in 2017 in the background of accession 
AL8/78; the current version (Aet v.5.0) has been 
improved using optical maps as well as Pac-Bio 
long-read sequencing (Luo etal. 2017; Wang 
et al. 2021). 

Since the initial release, several strides 
have been made in Ae. tauschii genomics. For 
instance, Zhou et al. (2021) developed reference 
quality genomes of four additional accessions 
representing four sub-lineages of Ae. tauschii 
with the intent to trace wild introgressions bet- 
ter in the germplasm. In the same year, the Open 
Wild Wheat Consortium (OWWC) generated 
whole genome sequencing (WGS) data for 242 
non-redundant accessions of Ae. tauschii, to 
probe the evolution of bread wheat, determine 
the variation within the population, and per- 
form genome-wide association studies (GWAS) 
for important traits using the AL8/78 reference 
genome (Gaurav etal. 2022). This study was 
able to show the two major lineages that make 
up the D genome in wheat, and using the wheat 
pan-genome, show the physical regions that 
come from these lineages. Additionally, a third 
lineage not associated with the evolution of 
bread wheat was also characterized. 

Further, using k-mer-based GWAS, candidate 
genes for flowering time, stem rust (Sr) resist- 
ance, trichome number, spikelet number, PM 
resistance, and wheat curl mite resistance were 
also reported. Efforts are currently underway by 
the OWWC to develop a pan-genome resource 
for Ae. tauschii which will provide further 
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information pertaining to the diversity prevailing 
in the genome sequences of diverse Ae. tauschii 
accessions (openwildwheat.org). 


12.2.4 Secondary Gene Pool Reference 
Genomes 


In comparison with the primary gene pool of 
wheat, genomic resources for members of the 
secondary gene pool are limited. Therefore, 
efforts are being made in this direction. For 
instance, (i) the development of reference 
assemblies for wild and cultivated T. mono- 
coccum  accessions areavailable for public 
use (Ahmed et al. 2023). 

Diploid wheat 7. monococcum which is a 
close relative of T. urartu (A genome donor) 
is the only species with both domesticated 
(T. monococcum ssp. monococcum) and wild 
type (T. monococcum ssp. aegilopoides) acces- 
sions. Therefore, the reference assemblies for 
these species once available will certainly help 
in simplifying wheat genomics and may be an 
improvement over the reference assembly avail- 
able for T. urartu. (11) Transcriptome data for T. 
monococcum is also available from an earlier 
study (Fox et al. 2014). (iii) A core set of wild 
einkorn as well as domestic einkorn was also 
recently categorized by Adhikari et al. (2022a). 
Using GBS data, 145 domesticated einkorn 
accessions and 584 wild einkorn accessions were 
divided into o, B, y, and monococcum. A set of T. 
urartu accessions were also a part of this study, 
and as expected, they clustered together distally 
from T. monococcum accessions. 

When compared to A and D genome, B 
genome of wheat has been difficult to study in 
a diploid species due the proposed extinction of 
the direct progenitor (Riley etal. 1958; Sarkar 
and Stebbins 1956). Researchers, however, have 
found a workaround this issue by working with 
species in the Sitopsis section of Aegilops (S'S") 
due to their close relatedness with the B genome 
(Kerby and Kuspira 1987). In the last decade, 
reference quality genomes for five Sitopsis 
species were released to help with additional 
resources for not only the elucidation of the B 
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genome of wheat, but also as a further resource 
in researching the D genome of wheat (Li et al. 
2022; Sandve et al. 2015; Yamane and Kawahara 
2005; Yu etal. 2022). Recently, Avni etal. 
(2022) also communicated the release of three 
reference quality genomes in the same section 
(Sitopsis) which included two new assemblies 
for Ae. sharonensis (S'S'), and Ae. speltoides and 
one assembly for Ae. sharonensis which was in 
fact first communicated by Yu et al. (2022). 

Alignments of the above assemblies with the 
different sub-genomes of wheat revealed a strong 
linear alignment of Ae. sharonensis and Ae. lon- 
gissima with the D genome of bread wheat, and 
that of Ae. speltoides with the B genome which 
is obvious due to their strong relationship with 
the respective sub-genomes (Fig. 12.2). This was 
also further supplemented with the clustering of 
high confidence gene annotations of Ae. sharon- 
ensis and Ae. longissima with bread wheat’s D 
genome as well as Ae. tauschii, and that of Ae. 
speltoides with WEW, durum wheat, and bread 
wheat’s B genome. 

In March 2022, reference assemblies of two 
additional “S” genomes (Ae. bicornis (SPSP) and 
Ae. searsii (SPSP)) were communicated, finally 
completing the Sitopsis section of the Triticeae. 
Both the above S genome assemblies also clus- 
tered with the D genome and D genome pro- 
genitors of wheat, in comparative alignments 
showing their closer association within the 
ancestry of wheat’s evolution. Interestingly, with 
this complete information, it was found that the 
divergence of the D-related Sitopsis clade from 
the D progenitors was predicted to have hap- 
pened around 5.23 Mya, whereas Ae. speltoides 
and the B genomes of both durum and bread 
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wheat happened more recently at 4.44 Mya. 
With these available genomes, more precise 
genomic research can now be performed in the 
B genome of wheat, as well as diving deeper 
into the evolution of the D genome and its 
progenitors. 


12.2.5 Tertiary Gene Pool Reference 
Genomes 


The tertiary gene pool of wheat is underrep- 
resented in terms of resource availability and 
research, due to the difficulty in defining these 
species, as well as limited genomic information 
(Qi et al. 2007; Schneider et al. 2008; Tiwari 
et al. 2015). Discussions in the literature have 
considered the Sitopsis section species as mem- 
bers of the tertiary gene pool, not including Ae. 
speltoides, but since the recent advancements 
in their genomic resources, it is more fitting to 
place them in the secondary genepool. Although 
some species, such as T: elongantum (EE), have 
had assemblies and annotations competed for 
attention with regards gene cloning, no reference 
genomes for wild grasses in the tertiary gene 
pool are currently available (Wang et al. 2020). 
Two cultivated species, on the other hand, 
belonging to the tertiary gene pool, barley 
(Hordeum vulgare; 2n=2x=14; HH) and rye 
(Secale cereale; 2n=2x=14; RR), have had 
reference genomes published in the last ten 
years. The original barley genome was the first 
species in the Triticeae tribe to have a reference 
genome (Melonek and Small 2022; Mochida 
and Shinozaki 2013; Purugganan and Jackson 
2021). Originally sequenced and annotated in 
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Fig. 12.2. Synteny between diploid wheat chromosomes 1A, 1S, 1D and hexaploid bread wheat's genome 
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2012, one of the biggest achievements in this 
assembly was overcoming the size and com- 
plexity of cereal genomes, due to the highly 
repetitive elements (Mayer et al. 2012). Since 
the release, updates have been made to prop- 
erly order the chromosomes and create a better 
physical map, as well as reduce the unanchored 
sequences from~250 to 83 Mb (Beier etal. 
2017; Mascher et al. 2017; Monat et al. 2019). 
In a similar fashion to the achievements in 
wheat, a pan-genome project was also developed 
in barley, which included the sequencing and 
assembly of 19 additional barley lines includ- 
ing two highly transformable lines (GOLDEN 
PROMISE and IGRI) as well as a wild barley 
genotype (Jayakodi et al. 2020). This resource 
was, and is, an important milestone in the 
advancement of cereal crop genomics due to its 
early elucidation. Rye is an important member 
of the tertiary gene pool as a contributor of high 
tolerance for both biotic and abiotic stresses. 
Additionally, rye has been an important player 
in wheat breeding due to the importance of the 
IBL/IRS and 1AL/IRS which confer resistance 
to multiple biotic diseases (Zeller and Sears 
1973; Jung and Seo 2014). Moreover, synthetic 
hybrids of rye and wheat, named Triticale, have 
gained popularity due to their nutritional value 
as forage (Zhu 2018). 

To better understand the underlying genet- 
ics behind the important aspects of rye, two 
reference quality genomes of wheat were 
released simultaneously in 2021. In the article 
by Rabanus-Wallace et al. (2021), a chromo- 
some scale assembly was developed in the back- 
ground of cv LO7, showing similar genomic 
makeup as other members of in Triticeae, and 
strong collinearity with the barley genome. 
Using this assembly, the researchers were able 
to determine a translocated region conferring 
frost tolerance in a 5A/5RL translocation line, 
first denoted using chromosome labeling and 
confirmed using read depth analysis on bread 
wheat’s 5A chromosome and rye’s 5R chromo- 
some. In another article by Li et al. (2021), an 
additional genotype of rye, cv. WEINING, had 
a reference assembly created, which provided 
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further support for the strong collinearity 
between the tertiary gene pool genomes. In their 
study, utilizing 2,517 single-copy orthologous 
genes, Li et al. (2021) developed a phylogenetic 
tree depicting 12 grasses and their evolutionary 
divergence. Although it is not necessarily new 
information, with the rye genome sequenced, 
the authors were able to compare rye with the 
other 11 sequenced genomes to deduce that rye 
had diverged from wheat~5 Mya after barley 
and wheat’s divergence, giving further evidence 
of rye’s closer relationship with bread wheat 
and its progenitors. For a summary of the state 
of reference genomes in Triticeae from the past 
five years (see Fig. 12.3). 


12.3 Alien Introgressions 
and Comparative Genomics 


As described above, wild wheat relatives play 
an important role in the production of high per- 
forming wheat cultivars. Modern breeding tech- 
niques have reduced the genetic diversity in the 
breeding germplasm to select for higher yield 
(Keilwagen etal. 2022; Sansaloni etal. 2020; 
Schneider etal. 2008). Utilizing DNA seg- 
ments from wild relatives that have been inte- 
grated into bread wheat's genome is a method 
to overcome this reduction in genetic diversity 
(Fig. 12.4); however, methods for detecting 
these introgressions are a must to properly trace 
these segments in breeding programs (Hao et al. 
2020; Molnár-Láng et al. 2015). In this section, 
we will describe the methods, both old and new, 
that researchers utilize to detect and trace these 
introgressions, describe the important genes that 
come from these introgressions, as well as show 
the usefulness of modern technologies for com- 
parative genomic analysis. 


12.3.1 Methods for Detecting Alien 
Introgressions 


Different methods for detecting the alien 
introgressions can be broadly classified into 
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Fig. 12.3 Timeline of reference genomes in Triticeae from 2017 to 2022 


cytological/cytogenetic, PCR-based markers 
and Recent Next Generation Sequencing (NGS)- 
based methods including skim sequencing. 


12.3.2 Cytological Methods 


Cytological methods for detecting chromo- 
somal morphological differences have been 
used for almost 100 years (Gill and Friebe 
1996). A popular method for observing differ- 
ent sizes and compositions of chromosomes 
was achieved by using centromeric heterochro- 
matin staining, or C-banding, which allows for 
visualization of chromosomes and/or karyotypes 
of different species on a conserved scale (Endo 
and Gill 1996; Gill et al. 1991). This method 
was used for detecting rye/wheat hybrid pair- 
ing as far back as 1977 as well as determining T. 
timopheevii introgressions in T. timopheevii x T. 
aestivum hybrids (Badaeva et al. 1991; Dhaliwal 
et al. 1977). The C-banding method used along- 
side genomic in situ hybridization (GISH) also 
allowed for the detection of introgressions from 
Ae. umbullata (UU), Ae. speltoides, Ae. comosa 
(MM), Ae. longissima, and T. timopheevii as 
well as several others as far back as the early 90s 
(Friebe et al. 1996). 

More recently, regions of Leymus racemo- 
sus DNA containing important Fusarium Head 
Blight (FHB) resistance gene Fhb3 introgressed 
into bread wheat were traced using GISH 
and C-banding (Qi etal. 2008). Another, still 
popular, method of visualizing introgressions 


in wheat is the use of fluorescence in situ 
hybridization or FISH, which utilizes fluores- 
cent-labeled DNA probes to detect important 
regions of chromosomes, such as introgres- 
sions (Campos-Galindo 2020; Jiang and Gill 
2006). This method is still much in use today 
to provide further evidence of translocations in 
wheat, including the previously mentioned frost 
tolerance associated region in rye introgressed 
into wheat background (Rabanus-Wallace et al. 
2021). This method has also been used to dis- 
sect introgressions coming from 7. elongantum, 
Ae. columnaris (USUSX©X°), Ae. caudata (CC), 
T. timopheevii, as well as many more not noted 
here (Badaeva etal. 2017; Devi etal. 2019; 
Grewal et al. 2020; Guo et al. 2022). Another 
use of this method was described in 2021, 
where FISH and GISH markers were utilized 
to visualize the recombination patterns of sus- 
ceptible vs resistant genotypes of Ae. geniculata 
(U*U*M*M?) introgression lines in F, families 
(Steadham et al. 2021). 


12.3.3 PCR-Based Markers 


Another method to detect alien introgressions 
is by using PCR-based markers that are poly- 
morphic between bread wheat and the wild 
species. The use of PCR-based markers for iden- 
tifying alien introgressions in bread wheat dates 
back to early 90s when Rogowsky et al. (1993) 
designed PCR and RFLP markers to detect 
famous 1AS.IRL, IBS.IRL, and 1DS.1RL rye 
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introgressions in wheat background. Since then, 
PCR-based markers are continuously being 
implemented for identifying  introgressions. 
More recently Li et al. (2019) designed markers 
to detect Thinopyrum intermedium ssp. tricho- 
phorum (JJJsJsStSt) introgressions in wheat that 
provide significant stripe rust resistance. To illus- 
trate the importance of old and new technolo- 
gies, these researchers utilized GISH, FISH, and 
C-banding in order to validate the effectiveness 


of the PCR markers, which now can be utilized 
in marker assisted breeding (MAS) to incorpo- 
rate these genes into the breeding germplasm. 
Further, polymorphic SSR markers were also 
developed recently to detect introgressions 
from synthetic amphidiploid species T. kiharae 
(A‘A'GGDD) which holds a reservoir of genes 
that have the potential to improve resistance to 
many diseases as well as increase the quality of 
flour production (Orlovskaya et al. 2020). 
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12.3.4 NGS Technology 


With the advent of cost-effective NGS meth- 
ods, researchers now have the ability to obtain 
sequence data coming from the transcriptome, 
exome, as well as the whole genome. This data 
can be generated from any species that the 
researchers are interested in, including the wild 
relatives of wheat. Examples of this have been 
mentioned in Sect. 12.2 of this chapter, in regard 
to whole genome assembly; however, data for 
wild relatives is constantly being generated for 
purposes of gene mapping and cloning, as well as 
diving deeper into wild relatives. One such exam- 
ple comes from Tiwari et al. (2015), where the 
5 Mg chromosome of Ae. geniculata was sorted, 
sequenced, and assembled to gain insight into 
this important species. This information helped 
with the fine mapping of Lr57 and Yr40 in trans- 
location wheat lines (Steadham et al. 2021). 

In the past 5 years, NGS data has been uti- 
lized to detect introgressions in Triticeae spe- 
cies without the additional step of SNP calling, 
which can create artifacts as well as require more 
computational resources (Li and Wren 2014). 
Genotyping by sequencing (GBS) data provides 
short and low coverage genomic data, usually 
for the purpose of creating VCF files in order to 
genotype a population with relatively low com- 
putational and storage requirements (Perea et al. 
2016). This data has now been shown to be able 
to discern introgressions in both wheat and bar- 
ley. In a study by Keilwagen et al. (2019), they 
were able to detect putative introgressions from 
wild relatives in wheat, including the IBL/IRS 
translocation. Interestingly, in the panel of 209 
elite European winter wheat varieties in which 
GBS data was generated, many of the regions 
where introgressions were detected, these over- 
lapped with important genes used in breed- 
ing programs such as Yr/7 from Ae. ventricosa 
(N"N"D'D) and Lr19 from T. ponticum as well 
as genes not yet known to be from wild rela- 
tive introgressions such as Glu-D1 and Ppo-D1. 
Due to the decrease in the cost of WGS data 
generation, one group set out to see the benefit 
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of using resequencing data from multiple wild 
relatives to detect introgressions, utilizing the 
10-- genomes described above. Keilwagen et al. 
(2022) used wild relative WGS data from both 
public repositories as well data generated from 
their own experiments to determine the regions 
of wild introgressions in 10 genotypes, gathered 
from the 10+ wheat genome project. In doing so, 
9 introgressions coming from wild relatives Ae. 
ventricosa, Ae. markgrafii (CC), Ae. speltoides, 
T. timopheevii, Ae. umbullata, Ae. uniaristata 
(NN), and T: ponticum were found to be present 
on chromosomes 2A, 2B, 2D, 3D, and 4A. The 
researchers determined that within introgressions 
found on 2AS (from either Ae. ventricosa or Ae. 
markgrafii), 2B (from T. timopheevii) and 2DL 
(from Ae. markgrafii or, Ae. umbullata) con- 
tained genes that shared 29096 amino acid simi- 
larity with genes coding for leaf rust and stripe 
rust genes, respectively. Fascinatingly, when 
checking the two introgressions that were pre- 
sent in all 10 genotypes, on 2A and 4AL coming 
from Ae. speltoides, in relatives of bread wheat, 
these introgressions were found in T. urartu, T. 
boeoticum, and T. monococcum, but not in 7: 
dicoccoides or T. spelta. The studies also deter- 
mined that these introgressions were able to be 
detected using only 1% of the total data. 

To further save on computational cost, 
researchers have shown that skim sequencing 
of genomes can be used at a coverage as low as 
0.025x, to determine introgressions, as described 
by Adhikari et al. (2022a, b). These authors used 
this method to determine barley introgressions 
on chromosomes 7A, 7B, and 7D in a popula- 
tion of 384 wheat-barley introgression lines. 
Additionally, they screened T: 
durum wheat amphiploid lines to find not only 
lines where there were possible introgressions, 
but also certain lines containing whole wheat 
chromosomes. Due to the efficacy and precision 
of this method of detecting introgressions, this 
method is more than likely to define what the 
future of alien introgression mapping procedures 
looks like for researchers not only in wheat, but 
in all important crops. 


intermedium- 
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12.3.5 Agronomically Important 
Genes Coming from Alien 
Introgressions 


One of the most important alien introgres- 
sions in wheat is the 1BL/1RS translocation, in 
which the short arm of chromosome IR in rye 
has replaced the short arm of 1B of wheat. This 
introgression has been used in wheat breed- 
ing not only for the disease resistance that is 
associated with this introgression, which has 
since become obsolete, but also because of 
the increased root biomass that has a positive 
effect on yield (Zeller and Hsam 1983; Sharma 
etal. 2011; Villareal et al. 1998). Despite the 
negative effects of this translocation on bread 
making quality, ~30% of modern cultivars con- 
tain the IBL/IRS segment (Wang et al. 2017; 
Zeller et al. 1982). For a list of varieties con- 
taining IR translocations visit http://www. 
rye-gene-map.de/rye-introgression/index.html 
(see also Ru et al. 2020). In recent years new 
IBL/IRS lines have been developed to over- 
come some of the shortcomings of older intro- 
gressed lines, in which resistance against stripe 
rust, as well as drought tolerance was observed 
(Ren etal. 2022; Sharma et al. 2022; Gabay 
et al. 2020). 

Ae. geniculata is also a genetic goldmine due 
to the strong disease resistance genes that are 
present in some accessions. The line TA10437, 
in which the 5 Mg chromosome was sequenced 
in 2015, contains important resistance genes 
against nefarious pathogens such as stripe rust 
and leaf rust (Tiwari et al. 2015). Recently, leaf 
and stripe rust resistance genes, Lr57 and Yr40 
respectively, have been fine mapped in Ae. 
geniculata translocation lines utilizing mapping 
populations derived from a cross between resist- 
ant TA10437 derived introgression lines and 
susceptible disomic 5 Mg addition lines in the 
background of CHINESE SPRING (Steadham 
et al. 2021). In this study, Lr57 and Yr40 were 
not only fine mapped to a 1.5 Mb region of the 
introgressed Ae. geniculata 5 Mg segment, but 
through phenotyping of the mapping popula- 
tion and the donor parent of the 5 Mg segment, 
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Lr57 was shown to provide further evidence 
of its broad-spectrum resistance, confirm- 
ing the results of an earlier study (Kuraparthy 
etal. 2007a, b). Moreover, this study showed 
that recombination is achievable in alien intro- 
gressions by crossing introgression lines with 
disomic lines containing homologous chromo- 
somes of the alien species. 

Sources of biotic disease resistance coming 
from wild relatives are unequivocally important 
for the sustenance and improvement of wheat; 
however, due to the associated linkage drag, 
their utilization in modern cultivars by durum 
and bread wheat breeders is limited for integrat- 
ing "exotic" resistance genes from wild or cul- 
tivated relatives into their elite material (Hafeez 
etal. 2021; Steiner etal. 2019). But with the 
ever-increasing knowledge of wild wheat rela- 
tives, new genes that confer resistance are being 
integrated into the germplasm without a yield 
penalty. Powdery mildew and stripe rust resist- 
ance genes, Pm5V and Yr5V respectively, trans- 
ferred from the annual diploid wheat relative 
D. villosum (VV) via amphiploid generation 
(T. turgidum x D. villosum, AABBVV) (Zhang 
etal. 2022). In order to integrate these genes 
into the germplasm, subsequent crossing with 
elite D. villosum introgression lines was per- 
formed and yielded lines with comparable yield 
to that of elite bread wheat lines. However, due 
to grain softness that is also associated with the 
5 V chromosome, chemical mutagenesis was 
performed to knockout this undesirable trait, 
resulting in comparable yielding, hard grained 
genotypes for utilization in wheat breeding. A 
summary of disease resistance genes coming 
from wild relatives is described in Table 12.1. 

Outside of resistance, genes  control- 
ling yield-related and end-use traits coming 
from wild relatives have also been utilized by 
researchers to further address the benefits of 
these species. Wild tetraploid wheat Agropyron 
cristatum (PPPP) has been used as a donor for 
abiotic and biotic disease resistance, as well as 
for yield-related traits for over 30 years (Chen 
etal. 1992; Zhang etal. 2015). In a study by 
Zhang etal. (2018), researchers found that 
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Pubing260, a T3BL.3BS/6PL translocation line 
containing a small terminal introgression from 
Ag. cristatum had increased grains per spike, 
spikelets per spike, thousand kernel weight, 
and flag leaf width in comparison with elite 
bread wheat genotypes without this segment. 
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Additionally, in 2022, a high molecular weight 
glutenin subunit (HMW-GS) gene coming from 
Ae. tauschii was directly introduced into bread 
wheat, and although the dough quality was 
reduced slightly, the quality of Chinese steamed 
bread increased (Bo et al. 2022). 


Table 12.1 Disease resistance genes coming from wild relatives 
Trait Wild relative Gene Chromosome Reference 
Powdery mildew Triticum monococcum — Pmlb, Pm25 TAL, 1AS Hsam et al. 
resistance (1998), Shi et al. 
(1998), Murphy 
et al. (1999) 
Triticum urartu Pm60 Zhang et al. 
(2022) 
Triticum turgidum var. Pm16, Pm26, Pm30, Reader and Miller 
dicoccoides Pm31 (1991), Rong et al. 
(2000), Liu et al. 
(2002), Xie et al. 
(2003) 
Aegilops speltoides Pm53 SBL Petersen et al. 
(2015) 
Dasypyrum villosum Pm55, Pm5V and Yr5V SAL, 5DL Zhang et al. 


(2015), Zhang 
et al. (2022) 


Aegilops tauschii Pm35 SDL Miranda et al. 
(2007) 
Leaf rust/strip rust Triticum ventricosum Yr17, Lr37 and Sr38 Delibes et al. 
resistance (1993), Jahier 
et al. (1996) 
Agropyron elongatum ` Lr19/Sr25 Sharma and Knott 
(1966) 
Aegilops geniculata Lr57 and Yr40 5DS Kuraparthy et al. 
(2007a, b) 
Aegilops peregrina LrAp 6BL Narang et al. 
(2020) 
Aegilops caudata LrAC SDS Riar et al. (2012) 
Aegilops markgrafii LrM 2AS Rani et al. (2020) 
Aegilops umbellulata Lr9 6BL Sears (1956) 
Aegilops triuncialis Lr58 Kuraparthy et al. 
(2007a, b) 
Aegilops tauschii yl e r2 IGT. Rowland and 
Lr42, Lr22a Kerber (1974), 
Kerber (1987), 
Cox et al. (1994) 
Thinopyrum ponticum | Sr26 and Sr61 6AL Zhang et al. 
(2021a, b) 
Stem rust resistance Thinopyrum Sr44 7DL Liu et al. (2013) 
intermedium 
Secale cereale Sr50 Mago et al. (2015) 
Fusarium head blight Leymus racemosus Fhb3 TAS Qi et al. (2008) 
resistance Thinopyrum elongantum Fhb7 TDL Wang et al. (2020) 
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12.4 Available Resources 
for Sequence Data and Plant 
Material 


Availability and accessibility of resources is par- 
amount for the development of higher yielding, 
disease resistant cultivars of wheat. Fortunately, 
there exists web-based databases for the extrac- 
tion of genomic and transcriptomic information 
regarding wheat and its relatives. Furthermore, 
there are avenues available for requesting seed 
material for many of the species mentioned 
above. In this section, we will provide an over- 
view of the publicly available sites that can be 
utilized to not only browse and obtain genomic 
data from bread wheat and wheat's wild rela- 
tives but also where to request seeds from repos- 
itories across the world. 


12.4.1 Web-Based Databases 
for Sequence Data 


The National Center for Biotechnology 
Information is a resource for genetic research 
for almost any species that has had any type of 
sequence information generated (Sayers etal. 
2022). Their user-friendly website allows for 
easy search for any topic, giving results for 
all 35 of their databases. A simple search for 
the term “Triticum” on December 12, 2022, 
yielded results in 26 of the 35 available data- 
bases. Over 4 million hits from this search go 
to their nucleotide database, whereas ~3 million 
hits come from the protein database. Moreover, 
NCBI’s sequence read archive (SRA) is a sig- 
nificant repository of sequencing data coming 
from NGS reads from researchers across the 
globe. These SRAs are mostly publicly avail- 
able and include genome and transcriptome data 
that is BLASTable. A search for T. intermedium 
in the SRA database yields over 4 thousand 
results, 184 of which are from reads coming 
from genome sequencing. Suffice to say, NCBI's 
website is a significant source of information, 
especially for those who may not have access 
to funding their own NGS studies. However, 
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due to the abundance of avenues in which data 
is deposited into their databases, curated naviga- 
tion for specific species may be overwhelming. 
Specifically, when BLASTing against their data- 
base, many of the hits received may be outdated, 
or repeats of similar information. 

Ensembl overcomes some of the pitfalls 
of NCBI by allowing users to select specific 
organisms to browse (Cunningham et al. 2022). 
Moreover, Ensembl plant removes species com- 
ing from Animalia, Fungi, and prokaryotes are 
removed to deconvolute searches for specific 
species. Although their database is not as robust 
as NCBI, the navigation of certain aspects is 
made much easier. Their biomart and down- 
loads tabs allow for easy access to nucleotide 
and protein data for the species hosted by the 
website, which can be downloaded from a sin- 
gle web page. Ensembl plant stays up to date 
with current versions of reference genomes, 
including the newest versions of T. aestivum, 
Ae. tauschii, and H. vulgare, although old ver- 
sions are still available. Another significant fea- 
ture of Ensembl is their variation track that is 
available for some species. This feature allows 
for users to find variants of specific genomic 
regions, either found naturally or induced via 
chemical mutagenesis. By clicking on this fea- 
ture, users are able to browse either the effects 
of these variations, or in some cases such as in 
T. aestivum, find accession numbers for mutant 
genotypes. This is very important for research- 
ers who are looking for variants of candidate 
genes in gene cloning projects, making it easy 
to find knockouts and/or missense mutations in 
candidate regions. Unfortunately, very few rela- 
tives of wheat are available for BLAST, genome 
browsing, or data acquisition. Currently, diploid 
species T. urartu, Ae. tauschii, rye, and barley 
are the only diploid relatives of wheat that are 
accessible using this website. 

For researchers who work specifically in 
small grains, GrainGenes is a curated database 
that has many features that are useful (Yao et al. 
2022). Genome browsers are easy to find and 
available for several wild relatives of wheat, 
including the five accessions of Ae. tauschii, 
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and three of the genomes in the Sitopsis sec- 
tion mentioned above. Additionally, BLASTing 
is robust, being able to select from many wild 
relatives, including all members of the Sitopsis 
section. Additionally, GrainGenes has an easy- 
to-use search for markers and probes found in 
literature. There also are some useful tools that 
are found in GrainGenes, including genome spe- 
cific primer (GSP) design. The website, how- 
ever, has become more cumbersome over the 
years as more and more data is being added to 
the site, though currently a more user-friendly 
interface is being developed. 

A wheat specific database also exists in the 
form of URGI (Alaux et al. 2018; see also Chap. 
2). This site allows for wheat-curated research 
in the form of BLASTs that can be performed 
on specific chromosomes for all available ver- 
sions. This is important because many of the 
times, in the literature, different versions of ref- 
erence genomes are used for research. This site, 
although not as user friendly as the previously 
mentioned databases, contains a significant 
amount of sequence data for wheat. 


12.4.2 Germplasm Acquisition 
Resources 


Researchers across the globe are willing to 
share material with one another for the greater 
good of assuring food security. Specifically in 
wheat research, seed requests can be performed 
from multiple sources. One such example is 
the Wheat Genetics Resource Center, hosted by 
Kansas State University. This site gives direct 
access to alien species coming from the afore- 
mentioned Sitopsis section, as well as multiple 
other species coming from Aegilops, such as 
Ae. geniculata. Along with this, there is access 
to Triticum species including diploid monococ- 
cum and urartu. WGRC also contains 95 unique 
accessions of Dasypyrum villosum coming from 
several different countries. Alien transloca- 
tion lines with transfers coming from Aegilops, 
Dasypyrum, Triticum, Secale, and Agropyron 
species are directly accessible from this resource 
as well. This site links to other important 
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germplasms and seed distributors such as 
CIMMYT and the USDA. 

CIMMYT (The International Maize and 
Wheat Improvement Center) and the USDA 
utilize the Germplasm Resource Information 
Network (GRIN) or GRIN-global to give inter- 
national institutions access to germplasms of 
several different species of plants, including 
wheat and some of its wild relatives. Although 
this resource is not specifically catered to wheat 
researchers, wild species belonging to Aegilops 
and Triticum are available. Similarly, Genesys is 
a resource for multiple different crop systems, 
but their user-friendly interface allows for easy 
search for species in Triticum. This site con- 
tains over 12 thousand accessions coming from 
Aegilops alone, and they are designated by sub- 
sets, including Aegilops core sets. 

The OWWC, mentioned in Sect. 12.2.1, has 
their panel available through the Germplasm 
Resource Unit (GRU) hosted by the John Innes 
Centre. This resource has similar resources as 
the aforementioned sites; however, they have 
a core collection of Titiceae wild relatives 
that include Dasypyrum, Aegilops, Triticum, 
and Eremopyron. This site also contains seed 
resources for mutant, DH, and other map- 
ping populations in wheat, as well as historical 
landraces. 


12.5 Concluding Remarks 


The ever-increasing breadth of knowledge 
coming from wheat and its relatives have large 
implications for improving the overall quality 
of cultivars in the coming years. This chapter 
gives an up-to-date overview of recent advances 
in genomic resources within wheat, highlight- 
ing the importance of wild relatives, and alien 
introgressions within the germplasm. The avail- 
ability of the wheat pan-genome has allowed 
for researchers to trace introgressions that are 
present within cultivars across the world, some 
of these alien introgressions were found within 
the entire pan-genome, giving further evidence 
of the importance of the genetic diversities 
(Keilwagen et al. 2022). As more of these wild 
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genomes get reference quality assemblies asso- 
ciated with them, the more we can learn about 
the important genes that lie within these spe- 
cies. On the OWWC website, a pan-genome 
of Ae. tauschii is currently underway, allowing 
researchers to get a more in depth understand- 
ing of the diversities that are present in these 
progenitor species. High-quality reference 
genomes are still required for some important 
species, such as T: elongatum, D. villosum, and 
Agrypyron species. Researchers would also ben- 
efit from pan-genomes representing other impor- 
tant wild relatives that have been mentioned 
in this chapter, such as that of wild diploid 
Triticum species, tetraploid Aegilops species. 
Extensive resources for obtaining both 
genomic data as well as seed material for these 
species are available for public use, making fur- 
ther novel research possible across the globe. It 
is an exciting time to work in the field of wheat 
research with the ability to obtain diverse popu- 
lations of not only bread wheat and its primary 
gene pool, but also members of secondary and 
tertiary gene pools from collaborators in dif- 
ferent countries. The web-based resources that 
exist now make it possible for quick turnaround 
for not only basic scientific knowledge but also 
for the integration of this diversity into local 
breeding programs. A future prospect that could 
make this process even more efficient is a local- 
ized database where these independent seed and 
data repositories can be accessed. CIMMYT and 
the USDA make it easy to find material from 
either establishment by utilizing systems like 
GRIN and GRIN-global, which share germ- 
plasm requests; however, many other institu- 
tions do not utilize this as a means for requests 
and distribution, and further many of these are 
not necessarily catered toward wheat-based 
research. A similar central database would be 
beneficial for the amount of sequence data that 
is becoming available in Triticeae. A system 
to search for data pertaining to specific gene 
pools could prove to be beneficial for future 
research, especially as more genomes are being 
sequenced. The examples and information pro- 
vided here will hopefully make it easier for 
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researchers, students, and curious minds alike 
to find information pertaining to wheat and the 
many species that make up its gene pools. 
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Abstract 


This century is facing huge challenges such 
as climate change, water shortage, malnu- 
trition, and food safety and security across 
the world. These challenges can only be 
addressed by (i) the deliberate application 
and utilization of cutting-edge technologies 
and (ii) combining/using interdisciplinary, 
multidisciplinary, and even transdisciplinary 
tools and methods. For scientists to respond 
to these challenges in a timely manner, it 
is required the adoption of new tools and 
technologies and then transforming the 
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technological outcomes into “knowledge”. 
It is highly unlikely that we could maintain 
or meet the demands in year 2050 unless we 
use scientific and technological resources 
effectively and efficiently. Multidisciplinary 
and interdisciplinary approaches combined 
with all available tools are integral for aca- 
demic and industry programs. This chapter 
summarizes wheat breeding and genetics 
coupled with genomics and speed breeding 
tools to assist with crop development and 
improvement. 


Keywords 


Genomics-aided breeding - Haplotype 
mapping - Speed breeding - Wheat genetic 
resources 


13.1 Sustainable Increase in Global 


Wheat Production 


Wheat (Triticum spp.) is a major source of car- 
bohydrates and is used as a staple food for 
global inhabitants. Genetically, diverse wheat 
resources show variable ploidy level (diploid, 
tetraploid, and hexaploid) as a result of pro- 
longed evolution and the wheat domestication 
process (Jordan et al. 2015). As an allopolyploid 
crop, wheat breeding and genetics investigations 
are generally considered challenging and has 
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provided for conventional breeding approaches 
to be complemented by genome-assisted breed- 
ing including the genomics toolbox with the 
available reference genomes to deal with the 
highly repetitive wheat genome and to decipher 
genotype-phenotype associations (Varshney 
etal. 2021a). More specifically, the increased 
sophistication of sequencing technologies/inter- 
pretation has led to extensive re-sequencing of 
low-copy genomic regions (Nyine etal. 2019) 
in diverse wheat haplotype mapping popula- 
tions that are managed with reduced crop-cycle 
through speed breeding, or fast-forward breed- 
ing, toward the wheat improvement (Varshney 
et al. 2021b; Jordan et al. 2022). A key require- 
ment is to understand diverse wheat genetic 
resources for trait improvement, environmental 
adaptations, and disease resistance under ongo- 
ing climate changing scenario. 

Genomics-assisted breeding (GAB) has 
contributed to the enhancement of germplasm 
and the crop/cultivar development process to 
characterize allelic variation for important 
agronomic traits associated with crop produc- 
tion and quality attributes as well as tolerance 
to abiotic and biotic stresses (Varshney et al. 
2005). With the advent of genome sequencing 
and the inclusion of genetic-based markers in 
sequencing repositories, a variety of genomic 
tools and approaches have become accessible 
for use in plant breeding. These methods and 
techniques include GAB which is capable of 
assisting growers in selecting appropriate paren- 
tal lines for various crossing programs in the 
breeding platform, which will ultimately result 
in the creation of genetic variation for pyramid- 
ing into breeding lines (Varshney et al. 2005). A 
significant variety of molecular genetic markers, 
such as simple sequence repeat (SSR), diversity 
array technology (DArT), single feature poly- 
morphism (SFP), and single nucleotide poly- 
morphisms (SNP), are now available, as well 
as inter-specific and intra-specific mapping 
populations (Kover et al. 2009) for chromosome 
sequence-aided molecular markers-based selec- 
tion strategies (Akpinar et al. 2017; Maccaferri 
et al. 2022). 
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13.2 Application of Genomic 
Breeding (GB) to the 
Development of Future Crops 


Several GB methods, including marker-assisted 
selection (MAS), marker-assisted recurrent 
selection (MARS), haplotype-based breeding 
(HBB), marker-assisted backcrossing (MABO), 
promotion/removal of allele through genome 
editing (PAGE/RAGE), and genomic selec- 
tion (GS), can be used in concurrently with 
speed breeding to design new varieties of crops 
(Varshney et al. 2021a). 


13.2.1 Haplotype-Based Breeding 
(HBB) in Wheat 


Recent developments in crop genomics have 
sparked the development of novel technologies 
that aim for diversifying the procedures of plant 
propagative strategies by combining desired 
phenotypes (Fig. 13.1) with the method of hap- 
lotype construction developed using informa- 
tion from sequencing genotypes (Varshney et al. 
2005, 2021a). Aiming for haplotype construc- 
tion, various crop species have made use of large 
SNP data sets obtained from genomic sequence- 
based technologies on multiple genotypes 
(Varshney et al. 2005) in order to define haplo- 
type-linked biomarkers. Haplotype construc- 
tion was initially challenging for the short-read 
sequences obtained through the second-genera- 
tion sequencing because of the lower probability 
of the presence of allelic variations in the form 
of single nucleotide polymorphism (SNP) or 
insertion-deletions (InDel). In contrast, the defi- 
nition of haplotypes using long-read sequences 
has become simpler, and in many specific crop 
species, the information is readily available from 
a large number of different individuals, includ- 
ing using single-cell approaches, and Pacific 
Biosciences (PacBio) and/or Oxford Nanopore 
Technology (ONT) based high-quality long- 
read sequencing technologies that show consid- 
erably greater genomic diversity (Torkamaneh 
and Belzile 2022). The method for constructing 
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Fig. 13.1 Overview of breeding strategies for crop improvement through GAB. The image was created using 


BioRender (https://biorender.com/) 


haplotypes using the breeding line sequencing 
data proceeds with the discovery and evaluation 
of the changes in the haplotype fingerprint using 
whole genome sequencing (WGS) data (Bevan 
et al. 2017; Bhat et al. 2021). Constructing hap- 
lotypes between adjacent SNPs on a chromo- 
some is an alternate method that may be used 
to increase the genome-wide association study 
(GWAS) potential. Haplotypes, in this way, are 
particular collections of alleles that are detected 
on a single chromosome. They are passing 
throughout the generation of the population col- 
lectively, and there is a low possibility that they 
may recombine in the future. 

Research on Triticum spp. has evinced that 
GWAS investigation based on haplotypes can 
be preferable to analysis based on a single 
marker in assessing the impacts of allelic vari- 
ation (Sehgal etal. 2020) and allows HBB to 
produce a customized crop varieties by combin- 
ing better haplotypes into a single plant, particu- 
larly novel combinational haplogroups. A wider 


pool of haplotype-linked genetic markers pro- 
vides wheat breeders with a greater chance of 
developing high-performing, linkage-drag-free 
hybrids (Varshney et al. 2021b). The transmis- 
sion of haplotypes within genetic populations 
must be monitored in order to pinpoint the best 
possible parents to cross and produce offspring 
with the beneficial adaptive and desired traits 
that are crucial for trying to create novel genetic 
compositions. Based on this premise, useful 
haplotypes have been identified by incorporat- 
ing the combined results of extensive, entire, 
genome sequencing, and haplo-phenotyping 
database analysis (Bhat et al. 2021). 

The construction of haplotype blocks typi- 
cally makes use of the following three methods 
in order: (1) user-defined length, (2) sliding 
window, and (3) linkage disequilibrium (LD). 
The user-defined set length of haplotype blocks 
(2-15 bp) is the simplest way; however, the 
created haplotypes do not represent genomic 
factors such as crossover or LD (Sehgal et al. 
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2020), nor do they represent a common evolu- 
tionary process (Templeton etal. 2004). The 
second one is by far the most popular choice 
among GWAS researchers when it comes to the 
construction of haplotypes (Sehgal et al. 2020). 
This method is simple and straightforward to 
use; but, when neighboring SNPs are strongly 
linked to each other, it produces information that 
is redundant; hence, it is no-more helpful than 
using SNPs alone (Sehgal et al. 2020). It is chal- 
lenging to determine the optimal window size 
for a genome-wide scan when LD frequencies 
differ throughout large genetic variants (Sehgal 
et al. 2020). This is similar to the previous point. 
In terms of finding instances of past integra- 
tion in the population of interest, the LD-aided 
approach stands out as being the most effective 
(Qian et al. 2017; Sehgal et al. 2020). 

According to an investigation by Brinton 
etal. (2020) on haplotype blocks in wheat, 
seven haplotypes (namely H1, H2,....,H7) were 
identified that included the gene TaGW2-A in 
the highly conserved genetic regions of chro- 
mosome óA responsible for increased yield 
characteristics. As the two SNP markers based 
on the promoter regions of this gene could not 
discriminate the haplo-blocks, the haplotype 
block provided more gene-associated markers 
for complete reliability (Varshney et al. 2021b). 
Studies by Luján Basile et al. (2019) character- 
ized haplotype blocks and GWAS in Argentinian 
bread wheats using genetic molecular mark- 
ers and SNP profiling and revealed that several 
haplotype blocks span throughout the genome 
and including conserved genetic regions, e.g., 
IBL/IRS wheat/rye translocation site on chro- 
mosome IBS (e.g., in Chinese wheats; see Ru 
et al. 2020). Moreover, most of the haplotypes 
identified had significant effects on the yield 
attributes through  multi-locational breeding 
trials. For spring wheat genetic resources, an 
approach of haplotype-based GWAS was tar- 
geted for epistatic interactions of multi-loca- 
tional breeding trials in CIMMYT (Mexico) 
led by Sehgal et al. (2020). This study aimed 
to explore the stable genomic regions of the 
haplotypes for improved yield components and 
haplotype interactions and used LD approaches 
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as numerous haplotype blocks were designed 
to span through >14 Mb of wheat genome. 
Haplotype-based GWAS revealed stable asso- 
ciations under drought stress environments with 
chromosomal hotspots. These studies support 
the need for developing genetic markers, and 
their deployment in agricultural crop develop- 
ment that are reliant on haplotypes rather than 
just single SNPs. Because full-genome sequenc- 
ing data for the breeding lines collection in a 
variety of crops is expanding, it can be antici- 
pated that the HBB method will continue to 
be used in the years to come (Varshney et al. 
2021a). 

Figure 13.1 provides a description of the 
integrative techniques that can be used to either 
add beneficial allelic variants to wheat genetic 
resources or remove harmful allelic vari- 
ants from them in order to prepare future crop 
breeding techniques. The collections of germ- 
plasm that are stored in gene banks include both 
advantageous and detrimental impact alleles. 
Combining high-throughput sequencing with 
multi-omics assays and field phenotyping offers 
a valuable tool for connecting genomic vari- 
ants with key phenotypes. The acquisition of 
knowledge about the genes that are responsi- 
ble for important plant characteristics lays the 
path for haplotype-based genetic breeding or de 
novo domestication (Qian et al. 2017; Bhat et al. 
2021; Varshney etal. 2021b). In this regard, 
speed breeding (SB) or fast-forward breeding 
approach will contribute to the acceleration of 
the advances made in crop breeding pipelines. 
The HBB strategy requires monitoring haplo- 
type transfer via breeding lineages as a crucial 
step in creating novel genomic variants because 
it helps select the appropriate parents for breed- 
ing to create offspring with the desired traits. 
Incorporating genomic information into defining 
recombinants formed by mating distinct sets of 
parents can help simplify desired traits of inter- 
est, in particular for complex traits such as adap- 
tation to harsh environments (Jensen et al. 2020) 
where it is necessary to distinguish between 
a correlation between different traits that are 
attributable to genuine linkage among the genes, 
or due to the pleiotropic actions of a given set 
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of genes (Bhat et al. 2021; Dixon et al. 2020). 
In the case of crops whose genomes include 
extensive linkage disequilibrium (LD) blocks, an 
HBB method becomes more pertinent since the 
LD blocks can be regions of conserved genetic 
variation. 


13.2.2 Involvement of Speed Breeding 
in Haplotype Mapping 
for Wheat Genetic Resources 


In plant breeding, generation time of a crop is 
a major factor to stabilize homozygote lines 
with enhanced genetic gain through hybridiza- 
tion and conventional breeding schemes. Some 
approaches such as double haploid, shuttle 
breeding, and tissue culture of embryo can help 
to minimize the generation time (Bhat et al. 
2021). But to some extent, major key crops 
are intractable in double haploid techniques. 
Moreover, genetic linkages, recombination, and 
the lacuna of dedicated plant organ and tissue 
cultural infrastructure promote additional breed- 
ing avenues to fixing the genes. The develop- 
ment of a new and more sophisticated breeding 
method known as speed (fast) breeding (SB) 
has made it feasible to hasten agricultural inno- 
vation by shortening plant phenological cycle 
and gear up the progression of generational 
advancement (Ghosh et al. 2018; Watson et al. 
2018). Speed or fast-forward breeding program 
deployed in several ways, such as by expanding 
light exposure time to the given crop species, 
instantaneously after it becomes available for 
grain harvesting, for fast propagation reduces 
the amount of time it takes for certain day- 
neutral and/or long-day plants to produce new 
generations (Ghosh etal. 2018; Watson et al. 
2018). The basic fact in wheat SB is utilizing 
the early flowering period by manipulating the 
photoperiod (day length) and temperature (ver- 
nalization or cold requirement) under controlled 
condition (Ghosh et al. 2018). In this way, hap- 
lotypes and improved new varieties belonging 
to the same species can be developed through 
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the synchronizing flowering time (anthesis) and 
introgressed into marker-assisted molecular 
breeding program coupled with abiotic stress 
tolerance (Song et al. 2022; Gahlaut et al. 2023). 
Under SB conditions, it could be possible to 
meet the flowering time of both wheat parents 
involved in the crossing experiments and propa- 
gation of future generations in very short time 
and space manner. Moreover, such accelerated 
generation times of this polyploid crop enable 
phenotypic screening of transformants for fur- 
ther selection and marker-aided investigation to 
improve grain yield, nutritional quality, improv- 
ing beneficial traits, flowering time as well as 
adaptations to both environmental instabili- 
ties and disease pressures (Watson et al. 2018). 
Along with the screening of the wheat lines for 
abiotic and biotic stress response, SB protocols 
and techniques can be manipulated for rapid 
screening of the population even in the off sea- 
son with the screening being done early in the 
life cycle of the plant generations (Alahmad 
et al. 2018; Ghosh et al. 2018). This is advan- 
tageous for breeding procedures especially 
for pyramiding beneficial/resistance genes for 
the production of climate-smart wheat. Speed 
breeding acts as a bridge to utilizing superior 
haplotype with exotic and adaptive alleles for 
haplotype-based breeding (HBB), genomic 
selection (GS), and genome editing. Using the 
SB approaches, accelerated generation can 
deliver the improved variety after going through 
high-throughput phenotyping, marker-assisted 
selection (MAS), genotyping, and sequencing 
(Fig. 13.2). In polyploid crops, haplotype phas- 
ing and scaffolding are becoming more advan- 
tageous as a result of increased chromosomal 
configuration monitoring (Zhang etal. 2019), 
sequencing, and Bionano Genomics (BNG) 
optical mapping-based genomic assemblies. 
SB coupled with single seed descent (SSD) 
for generation advancement of haplotypes and 
other bi- and/or multi-parental breeding popula- 
tions enhances molecular marker-aided breed- 
ing (MAB) and precise genome editing for the 
desired trait(s). 
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Fig. 13.2 Involvement of speed breeding in haplotype mapping to generate improved variety. This figure was pre- 


pared using BioRender application (https://biorender.com/) 


13.3 Conclusion and Future 
Perspective 


Breeding, especially breeding of main crops 
such as wheat, is as old as human history, and 
the focus on selection for mainly yield and high 
quality has tended to restrict the genetic diver- 
sity of modern wheat. The region in which 
domesticated wheat originated, namely in 
Mesopotamia in the Harran region of Turkey, 


has however a very large gene pool of Triticeae 
species with characteristics that provide for 
growth under challenging environmental con- 
ditions as well as to coping with multiple 
biotic factors. As detailed in Chap. 12, it is 
clear that these valuable abilities can be recov- 
ered in domesticated wheat varieties through 
alien introgression. For the present chapter, we 
have argued that molecular technologies can 
be captured in the form of haplotype mapping 
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combining selection based on haplotype signa- 
tures with speed breeding approaches as a pri- 
mary genomics-assisted breeding strategy for 
complex traits. 
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Wheat Sequencing: The 
Pan-Genome and Opportunities 


for Accelerating Breeding 
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Abstract 


Wheat is a crucial crop globally, with wide- 
spread cultivation and significant economic 
importance. To ensure food security amidst 
the increasing human population and new 
production challenges, such as climate 
change, it is imperative to develop novel 
wheat varieties that exhibit better qual- 
1ty, higher yield, and enhanced resistance to 
biotic and abiotic stress. To achieve this, lev- 
eraging comprehensive genomic resources 
from global breeding programs can aid in 
identifying within-species allelic diversity 
and selecting optimal allele combinations 
for superior cultivars. While previous single- 
reference genome assemblies have facili- 
tated gene discovery and whole-genome level 
genotype-phenotype relationship modeling, 
recent research on variations within the pan- 
genome of all individuals in a plant species 


A. N'Diaye : C. Pozniak (54) 

University of Saskatchewan, Crop Development 
Centre, Saskatoon, Saskatchewan, Canada 
e-mail: curtis. pozniak @usask.ca 


A. N'Diaye 

e-mail: amidou.ndiaye Q usask.ca 

S. Walkowiak 

Canadian Grain Commission, Grain Research 


Laboratory, Winnipeg, Manitoba, Canada 
e-mail: sean.walkowiak @ grainscanada.gc.ca 


@ The Author(s) 2024 


underscores their significance for crop breed- 
ing. We summarize the different approaches 
and techniques used for sequencing the large 
and intricate wheat genome, while highlight- 
ing the challenge of generating high-quality 
reference assemblies. We discuss the compu- 
tational methods for building the pan-genome 
and research efforts that are aimed at utiliz- 
ing the wheat pan-genome in wheat breeding 
programs. 


Keywords 


Wheat breeding : Sequencing - Pan-genome - 
Accelerated breeding 


141 Introduction 

In the early 2000s, technological advances in 
DNA sequencing allowed the sequencing and 
the comparison of the genomes from several 
individuals of the same species (Medini et al. 
2005). This helped fuel the notion that an indi- 
vidual genome is insufficient to serve as an 
appropriate genomic reference, since it does 
not capture the diversity that represents the spe- 
cies. The idea emerged of a "pan-genome" that 
encompasses the genomic information of sev- 
eral representative individuals. Pan-genomics 
was initially applied to many smaller and sim- 
ple genomes of microbial species, particularly 
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to understand presence/absence variation (PAV) 
In genes (Medini et al. 2005). The idea of the 
pan-genome has since been applied to diverse 
species across all taxonomic kingdoms and has 
evolved to consider all possible variation present 
between genomes, including non-genic, PAV, 
copy number, and structural variation (Jayakodi 
et al. 2021). Pan-genomics has also been applied 
more broadly to groups of related species or 
genera, for “super pan-genomes.” While still in 
its infancy, pan-genomics of crop species can 
be particularly valuable for harnessing genomic 
variants and increasing rates of crop improve- 
ment. The application of pan-genomes in crop 
breeding is gaining increased interest due to 
the importance of food security and the need 
for more efficient and effective breeding meth- 
ods. To date, pan-genomes have been applied 
to the improvement of various crops, including 
barley, maize, rice, tomato, and soybean (Gao 
et al. 2019; Gui et al. 2022; Jayakodi et al. 2020; 
Liu et al. 2020; Shang et al. 2022; Zhao et al. 
2018). Applications of pan-genomics for wheat 
improvement have also become possible since 
the completion and the public release of multi- 
ple high-quality reference genomes (Walkowiak 
et al. 2020). 

Wheat is a crucial crop globally, with wide- 
spread cultivation and significant economic 
importance, supplying a fifth of global calo- 
ries and protein (Dixon 2007; Shiferaw et al. 
2013). To maintain food security in the context 
of exponential growth of the human popula- 
tion while facing new challenges (e.g., global 
warming and climate change) in production, it 
is essential to create new wheat varieties with 
increased yield, better quality, and resistance or 
tolerance to abiotic and biotic stress (Abberton 
etal. 2016; Batley and Edwards 2016). Early 
wheat improvement relied on traditional breed- 
ing methods, where wheat lines were phenotypi- 
cally selected in field trials, which is both costly 
and labor intensive. As our understanding of 
wheat genetics improved, it became possible to 
identify major effect genes underlying qualita- 
tive traits and to select for these genes through 
marker-assisted selection (MAS, see also Chap. 
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9). Marker-assisted selection has been success- 
fully applied to certain traits, particularly dis- 
ease resistance (Miedaner and Korzun 2012). 
Unfortunately, many key traits, including yield, 
have a complex and polygenic determinism. 
Selection of quantitative traits that are more 
complex and are influenced by non-genic fea- 
tures, several genes, or gene interactions, require 
more advanced tools for making DNA-based 
selections. With the recent availability of high- 
quality genome assembly and gene annotations 
for wheat, it has been possible to apply high- 
throughput genotyping arrays or genotype-by- 
sequencing methods to gather genome-wide 
variation information and select for these com- 
plex traits at the whole-genome level, through 
genomic selection (GS) (Haile etal. 2021). 
Nevertheless, identifying key major effect genes 
as well as the mechanisms underpinning more 
complex traits requires a deeper understand- 
ing of the diversity of wheat and the impact of 
genomic variation on phenotypic traits. It is crit- 
ical to understand the diversity within wheat that 
is available to breeders in order to make breed- 
ing more efficient, identify suitable parents to 
use in targeted crosses, and select for the best 
possible combination of genes for rapid trait 
enhancement. 

Despite its importance for food security, the 
application of genomics and pan-genomics for 
wheat improvement has been challenged by the 
large size and the complexity of its genome. 
The genome is composed of three separated dip- 
loid subgenomes, resulting in allohexaploidy 
(genome AABBDD), where the ‘A’ subgenome 
was derived from T. urartu, the ‘B’ subgenome 
from a species related to T. speltoides, and the 
‘D’ genome from Ae. tauchii. The genome 
of modern bread wheat is estimated to be 17 
gigabase-pairs (Gb) in length and is composed 
of ~90% repetitive elements. Recent achieve- 
ments in genome sequencing and assembly 
technologies have enabled the release of mul- 
tiple wheat genomes and tools to create a pan- 
genome, which is inspiring a new age of wheat 
breeding. In this review, we explore the concept 
of pan-genomes and a pan-genome of wheat, the 
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history and evolution of the wheat genome and 
pan-genome, and the future outlook of wheat 
pan-genomics for research and applied breeding. 


14.2 Motivations for Studying 
Pan-Genomes in Crop Breeding 


During the last decade, there have been signifi- 
cant advancements in next-generation sequenc- 
ing (NGS) technologies, which offer a direct 
view into DNA variation. These advancements 
have created numerous possibilities to investi- 
gate the connection between genotype and phe- 
notype with greater precision than ever before. 
NGS has been used for various projects, includ- 
ing gene expression analysis, polymorphism 
detection, and the development of molecular 
markers (Barabaschi et al. 2012; Delseny et al. 
2010). With the advent of affordable genome 
sequencing, breeders have started using NGS to 
sequence extensive groups of plants, which has 
enhanced the precision of identifying quantita- 
tive trait loci (QTL) and simplified the process 
of discovering genes. This has, in turn, formed 
the foundation for creating models to compre- 
hend complex genotype-phenotype relation- 
ships at the whole-genome level. Over the past 
two decades, advancements in sequencing tech- 
nologies, assembly techniques, and computa- 
tional algorithms have enabled the release of 
genome sequences for over 700 plant species 
(Sun et al. 2022). 

In parallel, advancements in using DNA- 
based tools for plant breeding, such as MAS and 
GS, have progressed significantly. Genomics 
approaches identified genomic markers asso- 
ciated with traits and were termed as QTL 
(Geldermann 1975). A single QTL can harbor 
many genes within the same locus (Beckmann 
and Soller 1983; Westman et al. 1997). MAS has 
been in use since the early 1990s and involves 
identifying genomic markers in silico, which 
are within causal genes for traits or are closely 
linked, which are then used to select individuals 
(Tanksley and Nelson 1996). 

The development of 
assemblies has  expedited 


reference genome 
the process of 
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identifying candidate genes for in-demand traits. 
These assemblies serve as a basis for pinpoint- 
ing single-nucleotide polymorphisms (SNPs), 
copy number variations (CNVs), and inser- 
tion-deletions (InDels) within an individual's 
DNA sequence. The markers were used as the 
basis for conducting genome-wide association 
studies (GWAS) and genomic selection (GS), 
which involve comparing diversity panels with 
reference genomes to identify statistical asso- 
ciations between markers and traits (Crossa 
et al. 2017; Hayes and Goddard 2010; Varshney 
et al. 2009). Despite providing a greater insight 
into the diversity of plant species, particularly at 
the SNP level (Gore et al. 2009; McNally et al. 
2009), reference genomes cover only a limited 
portion of the overall genomic space of a species 
and are inadequate in capturing variation across 
every individual within a given crop species 
(Bayer et al. 2020). A paradigm shift is occur- 
ring due to new advancements in genomics, 
which now take into account the significance 
and amount of structural variations present in 
the pan-genome of crop species. This includes 
capturing all types of SVs such as PAVs, CNVs, 
and repetitive elements or TEs, present through- 
out the entire genome of all individuals belong- 
ing to a plant species (Danilevicz etal. 2020; 
Golicz etal. 2016; Tao etal. 2019). By cata- 
loging this variation and linking it to pheno- 
typic/trait information, it is then possible select 
parents and candidate wheat lines in breeding 
programs with more advanced knowledge and 
decision support tools, allowing for more effi- 
cient and targeted crop improvement. 


14.3 Historical Challenges 
and Progress in Wheat Genome 
Sequencing and Assembly 


Prior to the availability of NGS, whole-genome 
sequencing was performed using the Sanger 
sequencing technology. Due to a combination 
of several factors, including the cost and low 
throughput of Sanger sequencing, and the size 
and complexity of some large genomes, many 
genomes were first cloned into bacterial artificial 
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chromosomes (BACs) that included a few hun- 
dred thousand base-pairs per clone. This allowed 
for each BAC to be sequenced and assembled in 
parallel and then stitched together to assemble 
larger more complex genomes. After the release 
of the first human genome sequencing in 2000, 
which was achieved through the use of bacte- 
rial artificial chromosome (BAC) (Lander 2001; 
Venter et al. 2001), the Arabidopsis genome was 
the first plant genome to be sequenced using 
this approach. This was followed by the com- 
pletion of multiple versions of the rice genome 
two years later (Goff et al. 2002; Yu et al. 2002). 
The wheat genome’s larger size, almost 40 times 
that of rice, and its complexity, which included 
a high proportion of repetitive sequences and 
homoeologous DNA copies from three sub- 
genomes, made it economically unfeasible 
to employ a standard sequencing method. To 
tackle this challenge, the International Wheat 
Genome Sequencing Consortium (IWGSC) was 
established in 2005. The consortium divided 
the immense task among 20 countries based 
on chromosomes and chromosome arms. The 
approach employed genetic stocks that could 
be differentiated by flow cytometry on an indi- 
vidual chromosome basis (Consortium et al. 
2014). Physical maps and minimum tiling paths 
were produced by fingerprinting BAC librar- 
ies, which were subsequently sequenced and 
assembled (Safár etal. 2010). Although the 
chromosome-by-chromosome approach was 
adopted, it took nearly ten years to implement 
and was only partially accomplished for few 
chromosomes, including chromosome 3B (Paux 
et al. 2008). Due to the large size of the hexa- 
ploid wheat genome, certain researchers have 
opted to pursue a different approach by focus- 
ing on the genomes of related diploid species, 
such as Ae. tauschii. This species has a much 
smaller genome size, approximately one-third 
of that of hexaploid wheat (~ 4.792 Gb) and 
does not have any interference from homoeolo- 
gous DNA copies during physical mapping and 
eventual sequence assembly. Despite imple- 
menting this method, the initial use of regular 
agarose gels made the task seem overwhelming. 
However, to anchor contigs, higher throughput 
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technologies such as SNaPshot BAC finger- 
printing and Illumina Infinium SNP array were 
utilized. It took a decade to produce the first 
version of the Ae. tauschii physical map, which 
involved fingerprinting 461,706 BAC clones and 
assembling them into 2263 contigs. Afterward, 
7185 molecular markers were utilized to anchor 
these contigs onto a genetic map (Luo etal. 
2013). Despite some success with Ae. tauschii, 
the BAC approach had limited achievement in 
hexaploid wheat and the approach was slowly 
abandoned for wheat once more advanced DNA 
sequencing, sequencing library preparation, 
and genome assembly technologies became 
available. 

In the 2000s, wheat genome sequencing 
was boosted by Illumina sequencing technolo- 
gies, which were able to perform short read 
paired-end sequencing at high depth and low 
cost. The sequencing was first done on the dip- 
loid ancestors of common wheat due to their 
smaller genome size and early challenges of 
applying short read data to large polyploidy 
genomes. The draft genome assembly for Ae. 
tauschii, the D genome donor for bread wheat, 
was completed using short read sequencing 
methods to about 90 x coverage (Jia et al. 2013). 
Approximately, 83.4% of the genome was cov- 
ered by the assembled scaffolds, and out of 
these, 65.9% were identified as transposable ele- 
ments (TEs). Using RNA-seq data from differ- 
ent tissues, a total of 43,150 protein-encoding 
genes were identified. A comparable approach 
was employed to construct the genome sequence 
of the A genome contributor, T. urartu. The 
assembly that was obtained had a total length 
of 3.92 Gb, which corresponds to 79.35% of 
the estimated size of the A genome (4.94 Gb). 
However, due to subgenome interactions 
and evolutionary processes spanning around 
10,000 years, the genomes of the progenitors are 
not able to fully depict their counterparts in the 
common wheat genome. Therefore, the sequenc- 
ing of the common wheat genome was yet to be 
achieved. 

The first sequencing of the common 
wheat genome for the landrace CHINESE 
SPRING was accomplished using Roche 454 
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pyrosequencing, specifically the GS FLX 
Titanium and GS FLXI platforms, which were 
used to sequence the wheat genome to about 
5 x coverage. Sequencing of related progeni- 
tors was also performed using various platforms, 
such as Illumina methods for sequencing of 7. 
monococcum, the A genome donor of bread 
wheat. Likewise, Ae. tauschii was sequenced 
using the Roche 454 sequencing platform. While 
whole-genome data was not yet available, cDNA 
sequences were sequenced from Ae. speltoides, 
which has a genome similar to the B genome. 
Using the SOLiD sequencing platform, addi- 
tional short reads of CHINESE SPRING were 
generated. These yielded 95,000 predicted gene 
models, with most of them designated to either 
the A, B, or D subgenome. Despite its high 
degree of fragmentation, the draft genome was 
still considered valuable, as it was the first wheat 
genome available for community use (Brenchley 
et al. 2012). 

As the IWGSC adopted the chromosome- 
based BAC sequencing approach, progress was 
consistently made. As NGS became available, 
it was possible to sequence the BACs using 
more high-throughput methods. The approach 
involved developing sequencing libraries from 
the DNA of individual chromosomes or their 
arms and subsequently sequencing pair-end 
reads on the Illumina HiSeq 2000 platform. The 
assembly obtained, which resembled the 454 
assembly, comprised approximately 500,000 
contigs with N50 values ranging from 1.7 to 
8.9 kb. Its total size was 10.2 Gb. These con- 
tigs, taken together, make up 61% of the esti- 
mated hexaploid wheat genome. Predictions 
were made for a total of 133,090 high confi- 
dence genes, as well as 890,576 low confidence 
genes. Using a genetic map, just over half of the 
high confidence genes were assigned genetic 
positions (Mascher et al. 2013), allowing them 
to be considered within the context of the telo- 
some-based assembly resources for each chro- 
mosome arm. This led to the completion of a 
draft genome assembly of wheat, known as the 
IWGSC chromosome survey sequence (CSS) 
assembly (Consortium et al. 2014). 
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The IWGSC also accomplished a notewor- 
thy feat when they generated a reference-level 
sequence of chromosome 3B (Choulet et al. 
2014). This high-quality sequence was cre- 
ated using a minimum tiling path consisting of 
8452 BACs, spanning 774 Mb, and contain- 
ing 5326 protein-coding genes as well as 85% 
of TEs. Additionally, a molecular-genetic map 
(CHINESE SPRING x RENAN) was used for 
long-range orientation of DNA sequences. The 
assembly of chromosome 3B demonstrated 
the success of the chromosome-based BAC 
sequencing strategy, although the assembly 
remained approximately 796 incomplete. 


14.4 The Completion of a 
Chromosome-Scale Assembly 
of Hexaploid Wheat 


While evidence suggested the BAC sequenc- 
ing approach could work for achieving a chro- 
mosome-based wheat genome assembly, the 
complexity of the genome, high repeat content, 
high transposon activity, large genome size, 
and allopolyploidy were continuing to hamper 
assembly efforts. Meanwhile, third-generation 
sequencing technologies, which were created 
by Pacific Biosciences (PacBio) and Oxford 
Nanopore Technologies (ONT), surfaced and 
progressed quickly. These techniques produce 
reads with substantially longer lengths and have 
been extensively employed, in combination with 
established assembly algorithms, to construct 
intricate and sizable plant genomes with unpar- 
alleled precision (Cheng et al. 2021; Koren et al. 
2017; Niu et al. 2022). This led to a paradigm 
shift away from BAC sequencing and toward the 
direct shotgun sequencing of the genome using 
more advanced sequencing technologies and 
assembly algorithms. 

A new assembly method called MaSuRCA 
was used to assemble wheat using a hybrid 
approach that combined the strengths of both 
PacBio long reads, which have high error rates, 
and Illumina short reads, which are more accu- 
rate. This method was initially used to create a 
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genome assembly of Ae. tauschii (Zimin et al. 
2017a). To obtain a comprehensive sequence 
coverage of the genome, a combination of 
sequencing methods was employed, includ- 
ing over 19 million PacBio reads providing 
approximately 38 x coverage of the D genome, 
177 x coverage from Illumina HiSeq 2500 reads 
consisting of 200-base paired-end reads, and 
MiSeq reads consisting of 250-base paired-end 
reads. The sequencing libraries with a range of 
insert sizes yielded a total coverage of 200 x of 
the genome. The genome’s quality was validated 
through a comparison with optical maps and 
BAC assemblies that were produced indepen- 
dently. Subsequently, the pipeline was utilized 
to produce the initial near-complete hexaploid 
wheat genome for CHINESE SPRING (Zimin 
et al. 2017b). Triticum 1.0 was a genome assem- 
bly consisting of 829,839 contigs with a total 
size of 17.05 Gb, with a contig and scaffold N50 
of 76.3 kb and 101.2 kb, respectively. Another 
method involved assembling long reads directly 
with the FALCON assembler, which produced 
FALCON Tritl.0 with a size of 12.94 Gb. 
Although this version was shorter than the 
MaSuRCA-assembled version, it had a longer 
contig N50 of 215.3kb. Using the genome 
alignment tool MUMmer (Kurtz et al. 2004), the 
combination of Triticum 1.0 and Trit1.0 resulted 
in a final assembly that spans almost the entire 
wheat genome, with a size of 15.3 Gb and a con- 
tig N50 of 232.6 kb. 

At the same time, an alternative approach 
was also taken to create the CHINESE SPRING 
genome assembly using short reads (Clavijo 
et al. 2017). The approach involved 1.1 billion 
250-bp paired-end reads (33 x genome cov- 
erage) from CHINESE SPRING short insert 
libraries, and 68 x coverage of long insert 
libraries, yielding the TGACVvI version of the 
wheat genome assembly. This version spanned 
13.43 Gb and accounted for over 7896 of the 
wheat genome. In addition to the improved 
assembly, strand-specific Illumina RNA-seq 
and PacBio full-length cDNAs were combined 
to achieve better annotation. Although chromo- 
some-level assembly was not attained, this new 
wheat genome assembly was now available for 
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the broader scientific community to utilize, 
bringing the prospect of a high-quality reference 
genome into focus. 

Shortly thereafter, a breakthrough was made 
with the release of new short read assem- 
blers. NRGene's DeNovoMagic (NRGene, 
Ness Ziona, Israel) algorithm and the TRITEX 
pipeline (Monat etal. 2019) for short read 
assemblies demonstrated that a shotgun whole- 
genome sequencing approach could be achieved 
when combining different Illumina library sizes 
and preparation methods. The AABB genome 
of wild emmer wheat (WEW), which represents 
the reference-level genome of polyploid wheat, 
was produced through the utilization of the 
DeNovoMagic algorithm (Avni et al. 2017). By 
sequencing on Illumina HiSeq 2500 machines, a 
total of 2.1 Terabase-pairs were generated, com- 
prising 176 x genome coverage reads from five 
libraries. The insert sizes in the libraries ranged 
from 450 bp to 10 kb. The scaffolds were then 
consolidated using a high-density molecular- 
genetic linkage map and additional reads from 
a three-dimensional (3D) conformation capture 
Hi-C library. Ultimately, the final assembly was 
10.5 Gb, accounting for 87.596 of the predicted 
tetraploid wheat genome. The annotation of 
110,544 gene models provided strong evidence 
for the high quality of this genome assembly. 
Among these models, 58.896 (65,012) were 
identified as high confidence gene models, while 
the remaining 41.2% were of low confidence. 
This assembly successfully captured 98.496 of 
the total expected gene sets of WEW, as verified 
by BUSCO (Simão et al. 2015). Additionally, it 
was utilized for identifying the genes that played 
a role in the early domestication of wheat, as 
reported by Avni et al. (2017). After the comple- 
tion of the WEW genome, bread wheat genome 
sequencing efforts quickly pivoted toward the 
same shotgun genomics approach. The suc- 
cessful completion of the bread wheat genome 
IWGSC RefSeq v1.0 was achieved using a 
combination of similar techniques and soft- 
ware. According to Consortium etal. (2018), 
DeNovoMAGIC?2 utilized the complete genome 
as the primary framework and incorporated 
various sources of data such as physical maps, 
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genotyping-by-sequencing data, and Hi-C data. 
The common wheat genome was assembled 
into 21 pseudomolecules at the chromosome 
scale, which were assigned to the subgenomes 
A, B, and D. This resulted in a genome assem- 
bly with a super-scaffold N50 of 22.8 Mb, and 
total length of 14.5 Gb. Using a similar assem- 
bly approach, the genome sequencing of durum 
wheat (DW) was completed shortly after 
(Maccaferri et al. 2019). 


14.5 Progress Toward a Wheat 
Pan-Genome 


In 2018, the IWGSC released the first reference- 
quality genome sequence for the wheat landrace 
CHINESE SPRING, which marked a significant 
change in the use of genomics as a research tool 
for wheat. The publication enabled the wider 
research community to have easy access to this 
tool (Consortium et al. 2018). The CHINESE 
SPRING genome assembly was a major mile- 
stone in wheat genomics research, and within 
a few years, it has already laid the foundation 
for countless studies dissecting the genome to 
understand wheat biology. However, CHINESE 
SPRING shares only a distant ancestral con- 
nection with the majority of current wheat 
varieties. Additionally, due to the considerable 
diversity present within the species, a single 
genome sequence is insufficient for fully rep- 
resenting its genetic makeup. Additional pan- 
genome information is required to identify new 
genetic diversity that can enhance traits and 
understand the mechanism behind the traits pre- 
sent in elite wheat cultivars. Fortunately, with 
new short-read assembly algorithms capable of 
shotgun sequencing, the path forward to addi- 
tional genomes would no longer be a technical 
limitation. 

Choosing crop genotypes for pan-genome 
analysis is a challenging task as the objective 
is to encompass a wide range of genetic varia- 
tions using a limited number of representative 
genotypes for the particular species. This selec- 
tion procedure necessitates the acquisition of 
genome-wide genotypic data from either entire 
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genebank collections or representative sub- 
groups that cover all significant germplasm 
groups within the species. Recent reports have 
described several genebank genomics stud- 
ies on rice (Wang et al. 2018), barley (Milner 
et al. 2019), and wheat (Juliana etal. 2019). 
Soleimani etal. (2020) have described differ- 
ent methods that can be used to choose core 
sets for pan-genome analysis. One tool that 
aims to maximize diversity, representative- 
ness, and allelic richness of core sets is Core 
Hunter (De Beukelaer et al. 2018). It achieves 
this by using various algorithms that operate 
on genetic distance matrices. To further cus- 
tomize the selection process, clustering of the 
diversity space through principal component 
analysis (Patterson et al. 2006) or model-based 
ancestry estimation (Alexander et al. 2009) can 
be used. Pan-genome panels offer the possibil- 
ity of incorporating not only cultivated plant 
varieties but also wild progenitors or ancestors 
of polyploid species. For example, teosinte as 
a wild progenitor of maize and wild emmer or 
Aegilops tauschii as progenitors of wheat. These 
wild relatives are valuable out-groups and rep- 
resent diversity available in the secondary and 
tertiary gene pools. These relatives could be 
used to determine the ancestral states for SVs 
or because of their significance in introgression 
breeding (Harlan and de Wet 1971). Besides 
emphasizing on incorporating diverse global 
varieties in a crop, a pan-genome initiative 
might also choose specific genotypes that have 
a significant role in breeding and genetics. These 
could comprise founder genotypes of breeding 
programs, experimental population parents (Yu 
et al. 2008), or genotypes that can be genetically 
modified (Jain et al. 2019; Schreiber et al. 2020) 
to optimize the advantages for both research and 
breeding communities. These chosen accessions 
will serve as reference genotypes for future 
functional and genetic studies in pan-genomic 
research. 

The International 10+Wheat Genomes 
Project (www.l0wheatgenomes.com) was 
established in 2019 with the goal of creating 
reference-quality genome assemblies for at 
least ten diverse bread wheat cultivars. Using 
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genomic diversity analysis of 3800 wheat sam- 
ples, ten wheat lines were chosen and sequenced 
utilizing Illumina short read sequencing tech- 
nologies, and then assembled using NRGene’s 
DeNovoMagic algorithm (NRGene, Ness Ziona, 
Israel). Subsequently, all these assemblies were 
organized into subgenome-aware pseudomol- 
ecules with the aid of Hi-C technology (van 
Berkum etal. 2010). Additionally, five other 
wheat varieties were also sequenced and assem- 
bled to the scaffold level using separate short- 
read assembly algorithms established at the 
Earlham Institute (Norwich, UK). 

A gene projection strategy was imple- 
mented and applied to all assemblies to evalu- 
ate and compare the gene content of the newly 
sequenced lines in a fair and consistent manner, 
given the lack of genome-specific transcrip- 
tome data available at that time. This strategy 
involved using the CHINESE SPRING reference 
gene models and transferring them to all assem- 
blies. Differences in gene content among the 
10+wheat reference genomes were observed, 
likely due to the complex breeding histories 
of the selected lines. These variations in gene 
content were found to be linked with adapta- 
tion to different environments and with efforts 
to enhance grain yield, quality, and resistance 
to abiotic and biotic stresses. Significant struc- 
tural rearrangements and introgressions from 
wild relatives were observed upon comparing 
the pseudomolecule structures of the reference 
sequences. This underscores the importance of 
having multiple reference genomes of quality 
(at pseudomolecule level) instead of relying on 
resequencing approaches, as only chromosome- 
level assemblies can provide information on 
large- and small-scale structural rearrangements 
with a high degree of resolution and accuracy. 
The study conducted by Walkowiak et al. (2020) 
illustrates how the wheat pan-genomes can be 
utilized to study causal genes for traits, as the 
genomes were used to uncover the gene Sm, 
known for conferring resistance against midge. 
With the availability of recently sequenced and 
compiled wheat reference genomes, there is an 
unprecedented opportunity to identify func- 
tional genes and enhance wheat breeding. The 
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subsequent phase of the project will involve 
generating de novo gene predictions for all 
chromosome-scale assemblies using exten- 
sive transcriptome data. These data will offer 
a comprehensive understanding of the func- 
tional and regulatory arrangement of the wheat 
pan-genome. 

While the 10+Wheat Genomes Project 
provided the first insights into the wheat pan- 
genome, sequencing and assembly methods 
continued to evolve. Throughput increased for 
both PacBio and ONT sequencing platforms, 
leading to additional genome assemblies (Aury 
etal. 2022). Further, PacBio released its HiFi 
sequencing method based on circular consen- 
sus sequencing, which significantly improved 
sequencing and assembly accuracy. These long 
and accurate sequencing reads have led to the 
highest-quality genome assemblies of wheat 
achieved thus far. With the upcoming release 
of new long read sequencing technologies with 
high accuracy and output, such as the Revio 
platform from PacBio, it is expected that addi- 
tional genomes for wheat will be released in the 
coming years. While no longer constrained by 
technological limitations in genome sequencing 
and assembly, the next chapter begins for inte- 
grating these data into a functional pan-genome 
that will drive future research and breeding. 


14.6 AFunctional Pan-Genome 
for Wheat Research 
and Applied Breeding 


Pan-genome construction is the process of cre- 
ating a comprehensive set of genetic informa- 
tion from a collection of related genomes. It is 
a complex task, requiring the use of multiple 
approaches and techniques. It involves assem- 
bling and annotating all genomic information 
and variants, can be used to understand genome 
and gene evolution, discover new genes and 
alleles, and investigate gene-gene interaction 
networks. 

To construct a pan-genome, two primary 
methods can be utilized, whole-genome assem- 
bly and comparative genomics. Whole-genome 
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assembly involves assembling all of the reads 
from a collection of genomes into a single, 
contiguous genome. The steps for whole- 
genome assembly are well-documented (Jung 
et al. 2020). The approach is most appropriate 
for genomes that are closely related and pos- 
sess significant sequence similarity. It offers 
the benefit of an all-encompassing perspective 
on the species’ genetic variation, but it is often 
restricted by the number of genomes that can be 
sequenced. Comparative genomics (Pop etal. 
2004), on the other hand, involves comparing 
and contrasting multiple genomes to identify 
shared and unique components. This method is 
most suitable for more distantly related genomes 
with lower sequence similarity. 

The ability to assemble high-quality reference 
genomes for numerous plants simultaneously has 
been made possible by recent advancements in 
sequencing technologies and bioinformatic tools. 
Despite this progress, it is still challenging to 
perform combined analysis of multiple genomes 
or a subset of genomes and provide readily 
accessible genetic information to end-users, such 
as researchers and breeders (Li et al. 2020b). The 
comparison, analysis, and visualization of multi- 
ple reference genomes and their diversity neces- 
sitate powerful and specialized computational 
strategies and tools. De novo assembly, iterative 
assembly, and graph-based assembly methods 
have been employed to construct pan-genomes 
(Li et al. 2014; Liu and Tian 2020). 


14.6.1 De Novo Assembly 


Constructing a pan-genome can be achieved 
through the de novo assembly of genomes from 
multiple individuals, followed by comparative 
analysis to identify variant types and classify 
them as core or flexible genome components. 
This approach has been discussed by Mahmoud 
etal. (2019). Technological advancements in 
sequencing and assembly methods have enabled 
the generation of high-quality, chromosome- 
level plant genomes, including telomere-to- 
telomere genome assemblies (Miga etal. 
2020). However, generating accurate genome 


assemblies can be costly, especially for large 
plant genomes, and may not be practical when 
dealing with hundreds of reference genomes 
for a single species (Hurgobin and Edwards 
2017). Nevertheless, the 10+Wheat Genomes 
Project was successful at the construction of 
several chromosome-scale assemblies. Along 
with these genomes were tools to visualize hap- 
lotype blocks representing shared or unique 
regions between the assemblies (http://www. 
crop-haplotypes.com/) (Brinton etal. 2020). 
Likewise, many of the wheat genomes had 
major introgressions or large structural variants, 
which could be visualized using synteny viewers 
(https://kiranbandi.github.io/10wheatgenomes/, 
http://1Owheatgenomes.plantinformatics.io/). 


14.6.2 Iterative Assembly 


The iterative assembly approach differs from de 
novo assembly in that it commences with the 
creation of a single-reference genome, which 
is then used as a framework for the sequential 
alignment of reads from other samples. Any 
unmapped reads are subsequently assembled 
and incorporated into the reference genome to 
form a non-redundant pan-genome (Golicz et al. 
2016). This technique is less expensive than de 
novo assembly since low sequencing depths can 
be used for each sample, allowing for the pool- 
ing of numerous samples. Nevertheless, the 
iterative assembly method may struggle to han- 
dle genomes that contain many repeat regions 
and is not capable of detecting large structural 
variations that cannot be covered by individ- 
ual short reads (Jiao and Schneeberger 2017). 
Resequencing and iterative assembly methods 
have been applied to wheat (Montenegro et al. 
2017; Watson-Haigh etal. 2018). However, 
evidence suggests that wheat has a very plas- 
tic genome due to its allopolyploidy and has 
abundant PAV, CNV, and SV that are important 
for trait variation www.lOwheatgenomes.com, 
(Nilsen et al. 2020). Therefore, iterative assem- 
bly approaches, particularly low-coverage ref- 
erence-based analyses, are highly limiting when 
exploring wheat pan-genomics. 
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14.6.3 Graph-Based Assembly 


Pan-genomes can also be constructed using 
graphs. The most commonly used graph for 
this purpose is the compacted de Bruijn graph, 
which integrates genetic information from dif- 
ferent accessions of a species (Chikhi et al. 
2016; Li etal. 2020a). In contrast, the bi- 
directed variation graphs capture genetic vari- 
ations throughout a population and identify 
their potential positions on a reference genome. 
Compared to traditional linear genomes, graph- 
based pan-genomes have been shown to signifi- 
cantly mitigate reference bias (Garrison et al. 
2018). However, graph-based pan-genomes are 
challenging to construct and apply due to sev- 
eral factors, including the intricate nature of 
plant genomes with their high repeat content 
and polyploidy. Additionally, there is a short- 
age of common downstream analysis tools and 
visualization techniques for the graph, which 
further adds to the limitations. Despite these 
challenges, graph-based genomes have strengths 
compared to other methods and may have more 
widespread applications for wheat research 
and breeding in the future, particularly as tools 
for graph-based assembly of more complex 
genomes improve. 


14.6.4 Pan-Genome Annotation 
and Other Pan-Omics 


Once the pan-genome has been assembled, there 
are several techniques that can be used to anno- 
tate it. One technique is to use gene prediction 
software to identify genes in the pan-genome. 
This can be done using homology-based or 
de novo gene prediction algorithms. There is 
a plethora of abinitio gene prediction soft- 
ware (Scalzitti et al. 2020), including Augustus 
(Stanke and Morgenstern 2005), Genscan 
(Burge and Karlin 1997), GeneID (Parra et al. 
2000), GlimmerHMM (Majoros etal. 2004), 
and Snap (Korf 2004). Another technique to 
annotate the pan-genome is to use compara- 
tive genomics to identify conserved or novel 
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gene families. This involves comparing the 
genomes of different species to identify shared 
and unique components. By comparing gene 
sequences between two species, it is possible to 
identify regions of similarity that may indicate 
similar functions. In wheat, comparative genom- 
ics has been used for identifying resistance 
genes (Marchal et al. 2020) and uncovering the 
molecular basis of nitrogen-use efficiency (Shi 
et al. 2022). In addition to annotating the gene 
space, there is increasing interest in expanding 
the annotation of the pan-genome to include the 
dynamics of gene expression (pan-transcriptom- 
ics), epigenomic modifications (epipan-genom- 
ics), as well as interaction networks between 
variants as well as genes, and associating these 
directly with biological traits. Such a com- 
plete atlas of biological information will equip 
researchers and breeders with unprecedented 
tools for wheat research and improvement. 


14.6.5 Applying the Pan-Genome 
to Breeding 


After constructing and annotating the pan- 
genome, the subsequent step involves utiliz- 
ing it for crop enhancement. The effectiveness 
of next-generation breeding technologies, such 
as transgenics and CRISPR-Cas9 gene editing, 
has been proven for wheat (Nilsen et al. 2020). 
However, regulatory challenges exist that may 
limit the widespread adoption of these methods 
for delivering new wheat cultivars. As a result, 
wheat breeding will likely involve generating 
biparental populations and screening for prog- 
eny for some time to come. Gene discovery 
has certainly benefitted from the availability of 
pan-genomics resources for wheat, facilitating 
marker discovery that can be applied to MAS 
and making screening of parental lines and 
progeny more efficient (www. 10wheatgenomes. 
com). With the availability of more genome 
assemblies that are representative of the genes 
and genomic variants that can be used in breed- 
ing, the need to generate additional high-quality 
genomes will likely lessen as genomes can be 
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imputed based on lower coverage haplotype 
information; for example, from genotype-by- 
sequencing or high-throughput SNP arrays 
(Alipour et al. 2019). Having genomic informa- 
tion available for the parental materials being 
used in crosses, even if imputed, will allow for 
breeders to make stronger associations between 
traits of interests and variants within the 
genome, allowing for more efficient and targeted 
genomic-based selections to be made in their 
resulting progeny through GS. 


14.7 Conclusion and Future 
Directions 


Owing to its ability to identify novel genetic 
variations that can enhance crucial traits, the 
pan-genome serves as a valuable asset for crop 
breeding, specifically in wheat. Through consist- 
ent pan-genome research in crops, more robust 
and productive varieties are expected to be devel- 
oped, resulting in benefits for farmers and con- 
sumers worldwide. While it is difficult to predict 
all possible future applications of pan-genomics 
to wheat breeding, the resources are now avail- 
able to innovate. With recent advances in GS, 
artificial intelligence, and deep learning, one can 
only imagine the possibilities when applying 
these tools to pan-genomics, particularly if the 
pan-genomes are well annotated and have asso- 
ciated phenotypic data generated through applied 
breeding. This may not only be able to predict 
the performance of parents or offspring but could 
potentially help optimize designer genomes for 
specific purposes, environments, or stresses. 
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Abstract 


Future wheat production faces considerable 
challenges, such as how to ensure on-farm 
yield gains across agricultural environments 
that are increasingly challenged by factors 
such as soil erosion, environmental change 
and rapid changes in crop pest and disease 
profiles. Within the context of crop improve- 
ment, the ability to identify, track and deploy 
specific combinations of genes tailored for 
improved crop performance in target environ- 
ments will play an important role in ensuring 
future sustainable wheat production. In this 
chapter, a range of germplasm resources and 
populations are reviewed can be exploited 
for genetic locus discovery, characterisa- 
tion and functional analysis in wheat. These 
include experimental populations constructed 
from two or more parents, association map- 
ping panels and artificially mutated popula- 
tions. Efficient integration of the knowledge 
gained from exploiting such resources with 
other emerging breeding approaches and 
technologies, such as high-throughput field 
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phenotyping, multi-trait ensemble pheno- 
typic weighting and genomic selection, will 
help underpin future breeding for improved 
crop performance, quality and resilience. 


Keywords 


Multi-parent populations : Plant genetic 
diversity - Sustainable crop production : 
Nested association mapping (NAM) - 
Multi-parent advanced generation intercross 
(MAGIC) - Targeting Induced Local Lesions 
in Genomes (TILLING) 


15.1 Gene Discovery in the Context 
of Wheat Improvement 


and Breeding 


If you compare two bread wheat (Triticum aes- 
tivum L.) cultivars, the chances are that you 
will find differences between them—and lots of 
them. Whether these differences are for agro- 
nomic traits, such as resistance to disease, for 
quality traits such as those important for bread 
making, or for a range of morphological traits 
such as those used to uniquely ‘describe’ a 
variety during varietal registration (Jones et al. 
2013), such variation is abundant. It is the herit- 
able component of these observable differences 
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that is exploited via breeding to deliver new 
improved wheat varieties and deals with the 
complexities of pleiotropic effects result- 
ing from the process. The question as to how 
best to do this is not a straightforward one. To 
give a simplified example, phenotypic selec- 
tion for underlying combinations of genes and 
alleles that result in increased grain number per 
ear may result in fewer ears overall. Similarly, 
increasing the grain protein is often associated 
with a reduction in overall grain yield in wheat 
(Simmonds 1995; White et al. 2022) and other 
crop species (e.g. Dudley 2007), and increas- 
ing leaf size is thought to result in larger, but 
less dense stomata (Zanella etal. 2022). As 
the principal breeding target, grain yield repre- 
sents the sum of all interacting genetic/epige- 
netic, environmental and management factors 
that occur from sowing to harvest. Selection 
for grain yield works well, with breeders hav- 
ing consistently delivered ~ 1% genetic gains per 
year in wheat yield potential over recent dec- 
ades (e.g. Mackay et al. 2011). To some extent, 
wheat breeding practices focus on delivering 
performance under the assessment criteria and 
carefully managed growth conditions used by 
national bodies to determine subsets of the ‘best’ 
varieties marketed at a given time. In the United 
Kingdom (UK), for example, the annual AHDB 
‘Recommended List’ provides performance 
data for such varietal subsets to help farmers 
choose which varieties to grow (www.ahdb. 
co.uk/knowledge-library/reccommended-lists- 
for-cereals-and-oilseeds-rl). However, on-farm 
wheat yields are increasingly falling behind the 
genetic potential of the varieties grown. Termed 
the ‘yield-gap’, and observed in wheat growing 
areas across the world (Senapati et al. 2022), 
this is likely to be due to the cost-benefit and 
practical considerations and trade-offs that 
take place under commercial farm conditions. 
Future wheat production will face additional 
challenges such as environmental change, soil 
degradation, increasing energy and input costs, 
and the effects of political conflict or instability. 
Thus, wheat genetic improvement will increas- 
ingly need to focus on yield stability under 
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sub-optimal, fluctuating or unpredictable growth 
environments—delivered within the context of 
more sustainable food production systems. As 
the development of new wheat varieties is a rela- 
tively lengthy process (typically taking around 
10 years), all available tools must be exploited 
to meet these challenges. As underpinning tech- 
nologies advance, the ability to identify specific 
wheat genes or genetic loci, and understand how 
they function and interact within the context 
of crop performance, will play an increasingly 
important role towards delivering future wheat. 


15.2 Genetic Variation in Hexaploid 
Bread Wheat: Luck, 
Bottlenecks and Breeding 


If the foundation of gene discovery is heritable 
variation, then before exploring the germplasm 
and genomic resources currently in use to accel- 
erate gene discovery and functional analysis in 
wheat, it is first worth briefly considering the 
history behind current wheat genetic varia- 
tion. Collectively, the natural genetic variation 
present in modern day wheat represents the 
culmination of the speciation, domestication 
and breeding events and processes that have 
occurred in its past. Human selection and inter- 
ventions have affected the wheat genome and 
the variation it contains, starting from its first 
origins in Neolithic farmers' fields, up to the 
current day. However, variation at the DNA level 
is not so evenly distributed across the bread 
wheat genome. To some extent, this is due to 
the order, age and nature of the polyploidisation 
events that occurred during its speciation. The 
bread wheat genome is hexaploid (2n = 6x = 42), 
which means it consists of three subgenomes 
that have merged via inter-species hybridisation 
events during its evolutionary history (reviewed 
by Levy and Feldman 2022; Fig. 15.1). Notably, 
the most recent event was a spontaneous hybrid- 
isation around 9000 years ago between the tetra- 
ploid progenitor of pasta wheat (the AA and BB 
subgenome donor) and a diploid wild wheat 
relative that grew alongside it called ‘goat grass’ 
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1.28 MYA 
A subgenome donor divergence from T. uratu 


0.80 MYA 
Speciation of wild T. turgidum ssp. 
dicoccoides, followed by domestication 


4.49 MYA 
B subgenome donor 
divergence from Ae. speltoides 


= === Ae. speltoides SS 


O — mam Unknown BB 


D lineage 


Fig. 15.1 Evolutionary history of hexaploid bread 
wheat (Triticum aestivum) from its diploid and tetraploid 
donors progenitors. The unknown or extinct wheat B 
subgenome donor is a derivative of the S-genome species 


(Aegilops tauschii Coss., DD subgenome donor) 
to create hexaploid bread wheat (AABBDD). 
Due to this event being rare, recent, and having 
occurred in a restricted Ae. tauschii sub-popula- 
tion close to the Caspian Sea (Wang et al. 2013), 
little D subgenome variation was captured, and 
there has been comparatively little time for 
genetic variation to subsequently accumulate via 
spontaneous mutation. The effect of this is evi- 
dent in genetic analyses of bread wheat varieties 
from across the world (e.g. Wang etal. 2014; 
Walkowiak etal. 2020; Mellers etal. 2020), 
where D subgenome variation within genes is 
typically one-third to one-tenth of that seen on 
the A and B subgenomes. Consistent through- 
out the wheat subgenomes however is that gene 
density and gene variation are lower across the 
centromeric and adjacent pericentromeric chro- 
mosomal regions than in the remaining more 
distal chromosomal positions (IWGSC 2018). 
These centromeric and pericentromeric regions 
are associated with higher frequency of trans- 
posable elements (IWGSC 2018), higher levels 
of epigenetic modifications to DNA and histones 


—— T. uratu A"A" 


= T. turgidum AABB 


m T. aestivum AABBDD 


L. tauschii DD 


9000-8500 YA 
Speciation of T. qestivum 


of the section Sitopsis, which includes diploid Ae. spel- 
toides (diploid SS genome), Ae. bicornis (SPS"), Ae. 
longissima (S'S), Ae. searsii (SsSs) and Ae. sharonensis 
(S**S**), MYA = millions of years ago 


associated with heterochromatin (tightly 
packed DNA), and lower genetic recombina- 
tion (Gardner et al. 2016; Gardiner et al. 2019), 
which together are thought to result in the 
restricted rates of genome evolution observed 
in these regions (Akhunov et al. 2003). Against 
this genomic backdrop, in the - 9000 years since 
the speciation of bread wheat has been accumu- 
lating natural mutations which have either been 
retained or lost along the way due to a combina- 
tion of selection, drift and geneflow. Such shifts 
in variation have underpinned the many genera- 
tions of ‘on-farm’ selection that occurred from 
Neolithic times up until the advent of industrial 
breeding approaches at the end of the nineteenth 
century. Accordingly, wheat genetic variation 
was modulated across this time period by the 
interplay between human selection, be it con- 
scious (such as selection for larger grains) or 
unconscious (such as selection for photoperiod 
insensitive lines; Jones and Lister 2022), and 
environmental factors such as prevailing climate 
and disease pressures. This ongoing domestica- 
tion process resulted in the numerous locally 
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adapted ‘landraces’ that were grown across the 
world’s wheat growing regions up until the end 
of the 1800s. Early breeders exploited these 
sources of genetic diversity by systematically 
selecting and evaluating such landraces, as 
well as the crosses made between them. The 
outcomes of this history are still evident in 
modern wheat varieties, as these first breed- 
ing programmes commonly exploited the lan- 
draces that were locally adapted to their regions 
at the time. Evidence of this history can be seen 
in modern day wheat. For example, genetic 
marker analysis of wheat from around the world 
shows clustering of Chinese landraces and cul- 
tivars in genetic diversity space (Cavanagh 
etal. 2013), while in an analysis of 180 UK 
varieties released since the year 2000, almost 
90% include genetic contributions from the old 
Ukrainian landrace OSTKA-GALICYJSKA 
and the Mediterranean landrace from which the 
early UK variety SQUAREHEAD was devel- 
oped (Fradgley etal. 2019). Over the years, 
there have been concerns that the industrial 
breeding era has resulted in genetic bottlenecks 
in numerous crops, and that this has restricted 
genetic diversity in modern wheat. While there 
are many approaches to measure genetic diver- 
sity loss (reviewed by Khoury et al. 2021), for 
wheat it is clear that more genetic diversity was 
present in the landraces versus pure-line bred 
cultivars (e.g. Winfield et al. 2018). The assump- 
tion of loss of diversity when within the modern 
breeding period is not necessarily so apparent, 
with changes in diversity depending on mul- 
tiple factors, including the time period and 
region studied. One factor that has been noted 
is a reduction in genetic diversity at and soon 
after the introduction of the “Green Revolution’ 
semi-dwarfing genes across all international 
breeding programmes from the 1960s onwards 
(see Chap. 11). However, recent studies of on- 
farm wheat diversity indicate that at a national 
level, growers may now actually deploy a much 
more diverse portfolio of cultivars than was used 
100 years ago. For example, in the USA the 
number of major commercially grown wheat 
cultivars has increased progressively, increas- 
ing fivefold from 1919 (33 cultivars) to 2019 
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(186 cultivars) with pedigree-based diversity 
measures of 1353 commercial USA varieties 
grown across this period indicating this increase 
in cultivar diversity is likely linked to increased 
genetic diversity (Chai et al. 2022). In the UK, 
combining measures of relatedness based on 
shared parentage (kinship), weighted by the pro- 
portional yearly acreage of cultivars over the last 
30 years, found an increasing trend in the result- 
ing landscape diversity index (Fradgley 2022). 
While the dominance of a very low number of 
varieties across national cropping landscapes is 
not as common as it once was (such as the use 
of cv. CAPPELLE-DESPREZ across more than 
50% of the UK cropping area in the 1960s; 
Srinivasan et al. 2003), this is not necessarily 
the case throughout the wheat growing regions 
of the world. For example, between 2005 and 
2010 the cultivar WYALKATCHEM repre- 
sented more than 30% of the Australian wheat 
area sown, while more recently cv. MACE rep- 
resented over 65% of the wheat cropping area in 
both 2015 and 2016 (Phan et al. 2020). Notably, 
these recent examples of low Australian land- 
scape scale cultivar diversity are set against a 
wider background of a reduction in Australian 
wheat genetic diversity post Green Revolution 
(Joukhadar et al. 2017) and highlight the poten- 
tial vulnerability of such landscape scale cultivar 
predominance to changes in pest and environ- 
mental pressures. 


15.2.1 Systematic Broadening of the 
Wheat Genepool as Wild Wheats 
Are Deployed 


A longstanding concern is that breeding results 
in loss of genetic diversity—however, as noted 
above this assumption is not a given. A good 
example in cereals is the maize long-term selec- 
tion experiment, where continuous genetic 
gains within a closed population in response to 
selection for seed protein and oil content were 
observed across the 100-year programme, with 
no significant loss in genetic diversity (Dudley 
2007). Presumably, this was achieved via con- 
tinued selection for genetic loci of small additive 
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effect, as well as the fixing of epistatic interac- 
tions (i.e. instances where the allele of one gene 
hides or masks the phenotype of another gene) 
as additive effects. It is thus feasible to optimise 
existing variation present in wheat cultivars into 
new combinations, and to bring in additional 
genetic and functional diversity from system- 
atic introgression and analysis of chromosomal 
regions originating from landraces and species 
related to wheat. When present in otherwise 
elite wheat genetic backgrounds, the chromo- 
somal segments present in such ‘wilder wheats’ 
can often provide agronomic performance gains, 
despite the possible negative impacts of such 
chromosomal tracts (due, for example, to link- 
age drag or local effects on genetic recombina- 
tion). Reminiscent of the activities at the start 
of the industrial breeding age, initiatives across 
the world are once again systematically screen- 
ing variation captured in wheat landraces and 
are now supported by modern genetics, genom- 
ics, experimental population designs and analy- 
sis approaches. For example, the Watkins bread 
wheat landrace collection of 826 accessions 
from 32 countries has been genotyped using 
41 microsatellite markers (Wingen et al. 2014), 
and selected accessions from across the genetic 
diversity space crossed to an elite spring culti- 
var to create a series of bi-parental genetic map- 
ping populations (Wingen et al. 2017), termed a 
nested association mapping (NAM) panel. The 
benefits afforded by ‘wilder wheats’ created via 
introgressions from wheat relatives are illus- 
trated by the UK cultivar ROBIGUS. Released 
in the UK in 2020, ROBIGUS delivered high 
yields and contained particularly novel genet- 
ics derived from a wheat wild relative (Gardner 
et al. 2016) and has been frequently used in the 
pedigrees of subsequent UK varieties (Fradgley 
et al. 2019)—without associated loss of wheat 
cultivar genetic diversity at landscape scale 
(Fradgley 2022). Genomic analyses now show 
that the presence of introgressions from wheat 
relatives is relatively common (e.g. Cheng 
etal. 2019; Keilwagen et al. 2022; Pont et al. 
2019; Przewieslik-Allen et al. 2021; Scott et al. 
2020a). Indeed, introgressions often underlie 
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genomic regions conferring agronomically 
important traits—particularly disease resistance 
(Aktar-Uz-Zaman etal. 2017). For example, 
resistance to the wheat fungal disease yellow 
rust conferred by Yr34 originated from a region 
of chromosome 5A introgressed over 200 years 
ago from einkorn wheat (T: monococcum L. ssp. 
monococcum; Chen et al. 2021), and still con- 
fers field resistance in the US (Chen et al. 2021) 
and UK (Bouvet et al. 2022b). The long breed- 
ing history of use and utility of introgression 
from wheat relatives is exemplified by the exten- 
sive use since the late 1980s of synthetic hexa- 
ploid wheats in the international wheat breeding 
programme run by the International Maize and 
Wheat Improvement Center (CIMMYT) (Das 
et al. 2016; see also Chap. 11). Synthetic hexa- 
ploid wheats address the lack of genetic diver- 
sity on the wheat D subgenome by recreating 
the ancient hybridization event between tetra- 
ploid wheat and Ae. tauschii. This is under- 
taken via inter-specific crosses followed either 
by embryo rescue, chromosome doubling (Li 
etal. 2018a, b) or use of specific cytogenetic 
stocks (Othmeni et al. 2022). While more than 
1200 synthetic wheats have been generated by 
CIMMYT, historically these have sampled a 
relatively narrow range of Ae. tauschii diversity 
from the eastern Fertile Crescent. Systematic 
broadening of the diversity sampled in synthetic 
wheats is now being undertaken at pre-breeding 
initiatives at NIAB in the UK, where D subge- 
nome Ae. tauschii genetic diversity from across 
its natural eco-geographic range is being cap- 
tured in new synthetics and backcrossed into 
elite cultivars (Gaurav et al. 2022). While this 
and other initiatives (e.g. Zhou et al. 2021) are 
providing new sources of D subgenome genetic 
variation for breeding, similar approaches are 
systematically bringing in additional diversity 
from wheat A and B subgenome donors via the 
creation of inter-specific hybrids and subse- 
quent backcrossing. For example, the generation 
of backcross-derived progenies from crosses 
between 59 diverse accessions of tetraploid T: 
turgidum ssp. durum with elite spring wheat cv. 
PARAGON (see also Chap. 8). Introgressions 
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into elite wheat varieties from more distantly 
related diploid and polyploid grass species are 
also being generated, including Ambylopyrum 
muticum (TT genome, Coombes et al. 2022) and 
Thinopyrum species (Li and Wang 2009; Grewal 
et al. 2018; Li et al. 2018a, b; Cseh et al. 2019; 
Baker et al. 2020). The utility of genetic loci 
originating from the tertiary wheat genepool has 
begun to lead in the identification of the under- 
lying genes and genetic variants; for example, 
the wheat Fhb7 locus conferring resistance to 
the fungal disease Fusarium head blight, and 
which originated from a Th. elongatum intro- 
gression, has been shown to encode an amino 
acid transferase that detoxifies toxins produced 
by the infecting fungus (Wang et al. 2020). 


15.3 Current Genome-Wide 
Genotyping Approaches 
for Wheat 


The history of speciation, domestication and 
breeding outlined above has shaped the heritable 
variation present across the wheat genome. At 
the DNA level, this variation includes changes 
to singe nucleotides (single nucleotide polymor- 
phisms, SNPs), or via other rearrangements that 
typically involve DNA double strand break repair 
such as DNA insertions or deletions (InDels), 
gene copy number variation (CNV) and larger 
chromosomal rearrangements such as transloca- 
tion and/or inversion of larger tracts of DNA. In 
the 2000s, advances in wheat research such as 
the sequencing across multiple tissues, devel- 
opmental stages and cultivars of complimentary 
DNA (cDNA) transcribed from messenger RNA 
(mRNA), and subsequently the availability of 
genome assemblies for cv. CHINESE SPRING 
(the wheat reference genome; IWGSC 2018) and 
15 additional wheat cvs. (Walkowiak et al. 2020; 
Chap. 14) (Table 15.1) have led to detailed cata- 
logues of both genic and non-genic DNA varia- 
tion. Due to their abundance and nature, wheat 
studies over the last 10 years have most com- 
monly assayed genic single nucleotide poly- 
morphisms (SNPs) for use in genetic mapping 
approaches. Since the publication of the first 
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high-density wheat genotyping array in 2013 
capable of assaying - 9000 SNPs (Cavanagh et al. 
2013), several additional arrays ranging from 
3000 to 850,000 features are now available (Table 
15.2). While SNP genotyping arrays are relatively 
simple and cheap to use, one drawback is that 
only those variants that have been pre-selected 
to be present on the array can be assayed. Thus, 
if the SNP identification panel used to design 
the array does not contain adequate sampling of 
the variants in the target genepool, useful infor- 
mation on the variation present in a target set of 
germplasm cannot be adequately assessed. This 
is a common issue for example in synthetic hexa- 
ploid wheat and its derived germplasm, where 
much of the novel D subgenome variation cap- 
tured in this germplasm may not be assayed. 
More recently, reductions in costs have meant 
that sequencing-based genotyping approaches 
have become increasingly used in wheat. These 
include complexity reduction approaches such as 
genotyping by sequencing (GbyS) (Poland et al. 
2012), Diversity Array Technology sequencing 
(DArTseq™; Sansaloni et al. 2011) and exome 
and/or promotor capture followed by Illumina 
short-read (i.e. — 150 bp) sequencing (Table 15.2). 
More recently whole genome low-coverage 
sequencing is beginning to be used for genotyp- 
ing in wheat (Table 15.2) and is considered in 
more detail in Box 1. Natural variation in the 
form of InDels and CNV are also relatively abun- 
dant in the wheat genome (e.g. Pont et al. 2019; 
Walkowiak etal. 2020; Wang etal. 2022), and 
despite the relatively limited number of function- 
ally characterised wheat genes to date (Chap. 9), 
such variation has been shown to be a relatively 
common source of functional variation. For exam- 
ple, just within the flowering time pathway, dele- 
tions across putative cis-regulatory sites caused 
by double-stranded DNA break repair via non- 
homologous recombination have been shown 
to result in at least seven functional alleles of 
the VERNALIZATION1 (VRN-1) flowering time 
gene homoeologues in hexaploid and diploid 
wheat (Cockram et al. 2007), while CNV at the 
PHOTOPERIOD-1 (PPD-1) homoeologues deter- 
mine flowering time in tetraploid and hexaploid 
wheat (Díaz etal. 2012; Würschum et al. 2019; 
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Table 15.1 Bread wheat cultivars/lines with genome assemblies 


Cultivar Seasonal growth Origin Release year Genome assembly 
habit type 
CHINESE SPRING Spring China NA? Reference genome! 
ALCHEMY Winter UK 2006 PA} 
ARINALRFOR Winter Switzerland NA RQA? 
BROMPTON Winter UK 2005 PA? 
CADENZA Spring UK 1992* Scaffold? 
CDC LANDMARK Spring Canada 2015 RQA? 
CDC STANLEY Spring Canada 2009* RQA? 
CLAIRE Winter UK 1999 Scaffold?, PA? 
HEREWARD Winter UK 1991 PA? 
JAGGER Winter USA 1994* RQA? 
JULIUS Winter Germany 2008 RQA? 
LR LANCERS Spring Australia 2013" RQA? 
MACE Spring Australia 2008" RQA? 
NORIN 61 Facultative Japan 1944* RQA? 
PARAGON Spring UK 1988 Scaffold? 
RIALTO Winter UK 1994 PA? 
ROBIGUS Winter UK 2003 Scaffold?, PA? 
SOISSONS Winter France 1995 PA? 
SY MATTIS Winter France 2010 RQA? 
WEEBILL 1 Spring Mexico 1999* Scaffold? 
XI19 Facultative UK 2002 PA? 


RQA reference quality assembly. PA pseudomolecule assembly. NA not applicable. “ From GRIS database. !'IWGSC 
(2018). 2 Pre-publication BLAST access at https://www.cropdiversity.ac.uk/8magic-blast/. ?Walkowiak et al. (2020). 
* Application for Plant Breeders’ Rights date. * Landrace. $ LongReach Lacner. Additionally, a RQA is available for a 
winter accession of spelt wheat (T. aestivum ssp. spelta) accession PI190962 from Central Europe? 


see also Chap. 11).PPD-1) homoeologues deter- 
mine flowering time in tetraploid and hexaploid 
wheat (Díaz etal. 2012; Würschum et al. 2019; 
see also Chap. 11). 


if there are 200 lines in a bi-parental 
population, with each line sequenced to 
0.3 x coverage, we would expect on aver- 
age 60 x coverage of any single locus, and 
therefore 30x coverage of each allele at 
any bi-allelic locus, i.e. (200 x 0.3)/2. Thus, 
by cataloguing and the SNPs present at 
good coverage in the population as a whole, 


Box 1: Wheat genotyping via skim sequencing 
As genotyping via genome skim sequenc- 


ing is typically undertaken at significantly 
less than 1-times genome-wide sequence 
coverage per line assayed (termed 1x), 
multiple reads at any given chromosomal 
location are not expected for any single 
line. Therefore, this approach is suited for 
experimental populations with defined 
founders, such that confidence in the DNA 
variants identified from skim sequence in 
any one line is achieved via reads obtained 
from additional lines in the population 
that carry the same variant. For example, 


the presence of any of these SNPs identified 
via a single sequencing read in any given 
line can be called with good confidence. 
Pre-determining the sequence variants pre- 
sent in the population founders, for exam- 
ple by exome capture or whole genome 
assembly, may help the process of variant 
calling and the imputation of variants that 
are not directly sequenced in any given line. 
For example, Scott et al. (2020a) sequenced 
the 16 founders of a wheat multifounder 
population via exome- promotor capture 
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Table 15.2 Examples of recent high-density, high-throughput wheat genotyping approaches 


Genome-wide genotyping approaches 
SNP array 

9 k array (Cavanagh et al. 2013) 

90 k array (Wang et al. 2014) 

280 k array (Rimbert et al. 2018) 


660 k array (Cui et al. 2017) 
820 k array (Winfield et al. 2016) 


35 k array (Allen et al. 2017) 
DArTseq™ 


(Sansaloni et al. 2011, e.g. as applied in wheat by 
Sansaloni et al. 2020) 


Exome capture 


DNA probes covering 107 Mb of non-redundant exonic 
target space (Jordan et al. 2015), representing 33% of the 
RefSeq v1.0 high-confidence gene set 


Exome + promotor capture sequencing 


DNA probes covering 509 Mb exonic and 277 Mb pro- 
motor space (Gardiner et al. 2019). >20 samples can be 
multiplexed in a single capture 


Genotyping-by-Sequencing (GbyS) 

Complexity reduction via restriction enzyme digestion, 
adaptor ligation, PCR and sequencing (first applied to 
wheat by Poland et al. 2012) 


Skim sequencing 


Whole genome low-coverage DNA sequencing (e.g. as 
applied to a 16-founder MAGIC population, Scott et al. 
20202) 


PCR polymerase chain reaction 


identifying 1.13 million SNPs across the 
110,790 genes targeted by the capture 
probes. They then skim sequenced the 501 
derived recombinant inbred lines (RILs) 
at 0.3xcoverage, which directly identi- 
fied - 2896 of these SNPs (i.e. 1.13 million 
SNPs x 0.3 — 339,000 SNPs). SNP imputa- 
tion in the RILs was then undertaken using 
the software STICH (Davies etal. 2016), 
resulting in 94% of the 1.13 million founder 
SNPs to be called and founder haplotype 


DNA variation origin 


Genes from cultivars 
Genes from cultivars 


Genes and intergenic variants identified in whole 
genome sequence of 8 cultivars 


Unknown 


Exomes of 23 bread wheat cvs./landraces, and 20 spp./ 
accessions of diploid, tetraploid and decaploid wheat 


Subset of SNPs from the 820 k array, above 


DNA variants, including SNPs and SilicoDArT (pres- 
ence/absence variation) identified via genomic complex- 
ity reduction (achieved via restriction enzyme digestion/ 
ligation), PCR amplification of followed by DNA 
sequencing and bioinformatic analysis 


Genes identified from the wheat reference genome 
RefSeq v1.0 annotation (IWGSC 2018). Genes and 
DNA variants identified are dependent on the germ- 
plasm assayed 


Genes and promotors identified from the reference 
genome annotations of wheat (RefSeq v1.0 annotation, 
IWGSC 2018; TGACv1 annotation, Clavijo et al. 2017), 
Emmer wheat (Avni et al. 2017) and Ae. tauschii (Luo 
et al. 2017). Genes and DNA variants identified are 
dependent on the germplasm assayed 


DNA variants determined bioinformatically from 

the ~ 100-150 bp sequence data generated from restric- 
tion enzyme cleavage sites sampled from across the 
genome 


DNA variants originate from single sequencing reads 
per genotype assayed. For experimental populations, 
sequencing depth is achieved via reads from all lines in 
the population that carry the same genomic region 


dosage at each chromosomal location to be 
assigned for all RILs. Down-sampling the 
0.3 x read coverage showed RILs could be 
accurately inferred from sequence cover- 
age as low as 0.076 x per RIL. Notably, at 
sequence coverage of 0.076 x and above, 
imputation accuracy was not dependent on 
whether or not founder haplotypes were 
included as a reference panel. This means 
that accurate RIL haplotype mosaics in the 
RILs could be achieved without the need to 
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generate data on the 16 founders. In sum- 
mary, imputation from low-coverage whole 
genome sequencing of experimental popu- 
lations represents a relatively straightfor- 
ward and cost-effective genotyping strategy 
for bi-parental and multifounder experi- 
mental wheat populations and does not 
suffer from the inherent bias of SNP array 
genotyping approaches that require the vari- 
ants targeted to be pre-identified. 


15.4 Genetic Mapping Resolution: 
Population Size, Genetic 
Recombination and Effect Size 


Forward genetic mapping relies largely on the 
recombination fraction between a QTL and the 
genetic markers that have been genotyped in the 
population, and the heritability of the target trait. 
These considerations are reviewed in more detail 
elsewhere (e.g. Cockram and Mackay 2018), but 
in general greater genetic mapping resolution can 
be attained by increasing population size and/or 
undertaking additional rounds of crossing. Larger 
populations also have the benefit of providing 
greater QTL detection power. Important to con- 
sider is the heritability of the target trait and the 
effect size of the QTL detected. The more her- 
itable a trait is, and the larger its effect size, the 
easier it is to detect and precisely locate. Indeed, 
most wheat QTL resolved to the underlying 
gene level are for highly penetrant major genes, 
such as gene-for-gene disease resistance loci 
(e.g. for a recent list of cloned wheat rust resist- 
ance genes, see Bouvet et al. 2022b), awn pres- 
ence/absence (Huang et al. 2020), vernalization 
response (first undertaken in T. monococcum: Yan 
et al. 2003; Yan et al. 2004), plant height (Tian 
et al. 2022) and grain quality (Uauy et al. 2006). 
If trait heritability is low, phenotypic replication 
can increase line mean heritability and has been 
used to refine and update the genetic interval of a 
locus on chromosome 5A controlling ~ 10% vari- 
ation for wheat grain size (Brinton 2017; Brinton 
etal. 2017). Aside from such highly penetrant 
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genetic loci, the genetic architecture of most tar- 
get traits in wheat is highly quantitative in nature. 
For example, the mean QTL effect size for grain 
size traits in wheat is less than 10%, compared 
to more than 20% in the diploid cereal rice, and 
is likely due to the buffering effect of homoeo- 
logues of overlapping function in hexaploid 
wheat (Brinton and Uauy 2019). 


15.5 Population Types 


The identification of functional gene variants via 
genetic mapping relies on the capture of suffi- 
cient genetic diversity and genetic recombination. 
Fundamentally, two broad experimental population 
types are employed by researchers interested in 
identifying genetic loci controlling traits of inter- 
est. Both exploit genetic variation, and the reshuf- 
fling of this variation via genetic recombination, 
in order to associate markers or groups of markers 
(haplotypes, see also Chap. 9) with target traits. 


15.5.1 Experimental Populations 


Experimental populations are derived from 
crossing two or more parents to produce prog- 
eny in which genetic loci can be identified by 
the strength of the associations between genetic 
markers and traits of interest. Examples of some 
commonly used experimental populations are 
listed below and are illustrated in Fig. 15.2. 


15.5.1.1 Bi-parental 

Bi-parental populations are most commonly 
used in wheat forward genetics research and 
are constructed by first crossing two parents to 
generate first filial (F,) derived progeny lines. 
Inbred progeny are generated either by single 
seed descent (whereby individual F, lines are 
selfed over three or more generations to achieve 
acceptable levels of homozygosity genome- 
wide) or via doubled haploid approaches (where 
haploid F,-derived gametes undergo chromo- 
some doubling, resulting in completely inbred 
progeny in a single generation) (Fig. 15.2). 
Despite DH lines typically taking less time to 
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Fig.15.2 Illustration of experimental population and 
association mapping panel designs. Number of found- 
ers illustrated in each panel is indicated in brackets. 
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or doubled haploid approaches) to produce multiple 
inbred lines. AIC — advanced intercross, two rounds of 
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intercrossing illustrated, prior to the production of inbred 
lines. NAM — nested association mapping. NCII = North 
Carolina II model. MAGIC — multifounder advanced 
generation intercross. AM —association mapping 
population 
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create compared to RILs, DH populations cap- 
ture less genetic recombination. This is because 
additional genetic recombination events can 
occur between regions of heterozygosity from 
the F, generation (25% heterozygous) until 
effective fixing at around the F, stage (1.6% 
heterozygous) or beyond, and which on average 
is equivalent to one additional round of cross- 
ing. Bi-parental populations are now begin- 
ning to be constructed from wheat cultivars 
with genome assemblies, such as the CHINESE 
SPRING x PARAGON population (Wingen 
et al. 2017). 


15.5.1.2 Advanced Intercross 

Even when bi-parental populations are created 
via single seed descent, the amount of genetic 
recombination captured can be relatively low. 
One way to increase the number of genetic 
recombinations is to continue random inter- 
crossing of the F, for one or more generations 
before the production of inbred lines (Fig. 15.2). 
Such advanced intercross (AIC) populations 
(Darvasi and Soller 1995) designs provide 
greater precision compared to standard bi-paren- 
tal populations of the same size. For example, 
Darvasi and Soller (1995) estimated that eight 
rounds of random intermating would reduce 
a QTL interval from 20 to 3.7 cM. While AIC 
have been used in species such as Arabidopsis 
(Fitz et al. 2014) and maize (Balint-Kurti et al. 
2008), they have yet to be implemented in 
wheat—likely due to the time required to under- 
take additional rounds of crossing. However, 
the advent of ‘speed breeding’ approaches, 
that allow the generation time of both spring 
(Watson et al. 2018) and winter (Cha et al. 2022) 
wheat varieties to be reduced, means that for 
primary QTL screens, AIC approaches in wheat 
should become a more attractive prospect. 


15.5.1.3 Near Isogenic Line Pairs, 
Introgression Lines 
and Chromosome Segment 
Substitution Lines 
A near isogenic line (NIL) captures a relatively 
small chromosomal region from one 'donor 
parent within the wider genomic context of a 
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second ‘recipient’ parent (Fig. 15.2). NILs are 
generated via repeated rounds of backcrossing, 
often with the use of genetic markers to select 
for donor at the target chromosomal region, and 
for the recipient across the remainder of the 
genome. NILs are commonly used to target spe- 
cific QTL of interest, allowing the effect of the 
contrasting alleles captured in the NIL pair to be 
evaluated using a single pair of lines, rather than 
a larger population in which additional genetic 
loci affecting the target trait may be segregat- 
ing. Following this approach, individual genetic 
loci controlling a target trait can be investi- 
gated in detail, and the underlying physiology 
and pleiotropic effects on related traits can be 
assessed. Further, a NIL pair can be crossed to 
generate further genetic recombination and so 
further refine the genetic interval. For example, 
contrasting alleles at a major effect genetic locus 
for wheat grain weight identified in a bi-parental 
population of 192 inbred lines was subsequently 
assessed via phenotypic evaluation of BC,- and 
BC,-derived NILs, finding the- 746 increase in 
grain weight was (i) mediated predominantly by 
increased grain length, (ii) the maternal pericarp 
cell length was longer in the NIL carrying the 
high grain weight allele, and that (ii) increased 
grain length was detectable 12 days after ferti- 
lisation (Brinton etal. 2017). Additionally, the 
NILs were used to further refine the genetic 
interval to 4.3 cM (Brinton etal. 2017), with 
further analysis indicating that two genetic loci 
may be present at the locus (Brinton 2017). 
A series of NILs that capture chromosomal 
segments from wild and domesticated wheat 
relatives is termed introgression lines. Recent 
work in the UK has generated such germplasm 
resources for a range of wheat relatives. These 
include diploid Ae. caudata (CC genome. 
Grewal et al. 2020), Am. muticum (TT. Coombes 
et al. 2022), Th. bessarabicum (JJ. Grewal et al. 
2018), and T. uratu (A"A". Grewal et al. 2021), 
tetraploid T. timopheevii (A'A'GG. Devi et al. 
2019), hexaploid Th. intermedium (JJ J'5J'5 
StSt. Cseh et al. 2019) and decaploid Th. elon- 
gatum (EPE? EPE? EPF? ES'ES! ES'ES'. Baker 
et al. 2020), with all introgression lines gener- 
ated using the recipient wheat cv. PARAGON. 
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When a series of NILs is designed to collec- 
tively capture the entire donor background, the 
resulting resource is termed a chromosome seg- 
ment substitution line (CSSL) population. In 
wheat, CSSLs populations that capture novel 
A, B and D subgenome diversity from wheat 
relatives have recently been developed using 
(i) a synthetic hexaploid wheat line (Horsnell 
et al. 2022) and (ii) a tetraploid T. turgidum ssp. 
dicoccoides accession (TTD-140). Not only do 
CSSL populations serve as useful sources of 
novel variation, they can also be used directly 
for genetic mapping, as recently illustrated in 
wheat by Horsnell et al. (2022). 


15.5.1.4 Multifounder Populations: NAM 
While bi-parental populations and derived NILs 
had long been the mainstay of forward genetic 
approaches, multifounder populations have 
recently become commonplace in plant research 
(reviewed by Scott et al. 2020b). Multi-parent 
mapping populations capture more variation 
than bi-parental populations and increase preci- 
sion via joint linkage and association analysis. 
Nested association mapping (NAM) popula- 
tions represent a series of bi-parental popula- 
tions (termed ‘families’), each of which has the 
same parent in common (Fig. 15.2). The first 
NAM population was made in maize (Zea mays 
L.) by crossing 25 diverse inbred lines with 
the inbred line B73 (termed here the ‘linking’ 
founder)—one of the most widely used lines 
in the history of maize breeding, and the line 
used for the maize reference genome (Yu et al 
2008). Since then, the maize NAM parents have 
become extensively characterised, including 
provision of their genome assemblies (Gage 
etal. 2020). The genetic resolution obtained 
from NAM populations largely depends on the 
number of alleles present in the founders and 
the amount of genetic recombination captured 
in the progeny. The rarest alleles in any NAM 
population will be present in half of the prog- 
eny from the corresponding family. Therefore, 
in a NAM with 25 families and 200 progeny 
per family, rare alleles are expected to be pre- 
sent in 100 of the total 5000 progeny lines, i.e. 
a frequency of 2%. For NAM design, increasing 
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the number of founders at the expense of fam- 
ily size should be preferable, as the decay of 
parental linkage disequilibrium for a given 
allele would likely, on average, be shared 
among more parents (Gage et al. 2020). NAM 
populations have now been made in many crop 
species and can be genetically analysed using 
association mapping approaches. At least part 
of the attraction of NAM design is that their 
composition (a series of bi-parental populations 
with a common parent) makes them more con- 
ceptually familiar to researchers experienced 
with bi-parental populations. Indeed, once a 
genetic locus has been identified in a NAM, it 
is straightforward to continue further analysis 
using one or more of the relevant constituent 
bi-parental populations. To date, several NAM 
populations have been created in wheat (Table 
15.3; Fig. 15.3). The founders used include elite 
cultivars (e.g. Bajgain et al. 2016), genetically 
diverse landraces (Wingen et al. 2017), as well 
as germplasm that captures backcrossed chro- 
mosomal segments from wheat relatives via 
synthetic hexaploid wheat and wheat vs tetra- 
ploid durum wheat (T. durum ssp. durum) intro- 
gression lines. Further, a recent durum NAM 
has been constructed by crossing 50 durum lan- 
draces to an Ethiopian durum cultivar (Kidane 
et al. 2019). The largest wheat NAM currently 
available was constructed using 60 inbred 
worldwide landraces from the Watkins wheat 
landrace collection, backcrossed to the spring 
UK cultivar PARAGON, generating a popula- 
tion of 1192 RILs and a mean of 105 RILs per 
family (Wingen et al. 2017). Therefore, the rar- 
est allele captured in the Watkins NAM would 
be expected to be present in 4% of the popu- 
lation—a frequency nominally sufficient for 
detection via genetic analysis. 


15.5.1.5 Multifounder Populations: North 
Carolina II Model 
A notable limitation of NAM populations is 
that while multiple founders are employed, a 
single ‘linking’ parent is used with which to 
cross to. The North Carolina II (NCII) design 
of Comstock and Robinson (1952) is concep- 
tually an extension of NAM, whereby two or 
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Fig. 15.3 Features of existing wheat nested association 
mapping (NAM) populations, comparing mean NAM 
progeny per component bi-parental population (x-axis) 


more “linking” parents are used such that every 
progeny family has half-sib relationships both 
through a common mother and through a com- 
mon father (Fig. 15.2). Similarly, any combina- 
tion of populations with founder links between 
them can be analysed together to undertake 
genetic analysis and to increase power and pre- 
cision by increasing sample size (Cockram and 
Mackay 2018). However, such populations are 
more commonly used to confer detection of 
QTL in different genetic backgrounds and on the 
analysis of epistasis. 


15.5.1.6 Multifounder Populations: 
MAGIC 

While NAM and NCII populations capture more 
diversity than bi-parentals, they capture no addi- 
tional genetic recombination than bi-parental 
populations of the same size. Since its pioneer- 
ing use in mouse in 2002 (The Complex Trait 
Consortium 2002), the multi-parent advanced 
generation intercross (MAGIC) design has 
been applied to many crop species (Scott et al. 
2020b). To aid crossing design, MAGIC popula- 
tions typically use 4, 8 or 16 founders. However, 
unlike NAM or NCII populations, all MAGIC 
founders are intercrossed over multiple rounds 
of crossing to produce progeny that capture 
equal proportions of each founder genome 


with the number of NAM founders (y-axis) and the size 
of the resulting population (proportional to the size of the 
circle) 


(Fig. 15.2). Thus, MAGIC combines the ben- 
efits of increased genetic diversity afforded 
by NAM and NCII, with increased amounts of 
genetic recombination afforded by AIC, while 
minimising population structure via controlled 
crossing. In contrast to bi-parental populations, 
which are typically constructed to target a sin- 
gle target trait and are relatively quick to gen- 
erate, MAGIC populations aim to capture and 
recombined multiple alleles across the genome 
and therefore take much longer to create. 
However, once complete, MAGIC, as well as 
other multi-parent populations, are well suited 
as community resources. In wheat, six MAGIC 
populations have been published, the first of 
which was the Australian spring wheat 4-parent 
MAGIC (Huang etal. 2012). Since then, four 
additional MAGIC populations have been cre- 
ated: 8-parent populations from Australia (Shah 
et al. 2019), the UK (Mackay et al. 2014) and 
Germany (Sannemann etal. 2018; Stadlmeier 
etal. 2018), as well as a 16-parent European 
wheat MAGIC (Scott et al. 20202) (Fig. 15.4). 
Additionally, a MAGIC-like wheat population 
made between one male-sterile line crossed and 
backcrossed with 59 European/worldwide lines, 
followed by 12 generations of random intermat- 
ing, has been generated (Thépot etal. 2014). 
To date, the 8-founder NIAB Elite MAGIC 
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UK wheat 
diversity panel 


Gartons_60 
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120 x 2-way 
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250 x 8-way 
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d e 


e 


RILs 


WW 


Fig.15.4 Crossing diagram illustrating the founder 
selection and pedigree of the wheat 16-parent ‘NIAB 
Diverse MAGIC’ population. The red and blue lines each 


population likely has the most publicly available 
resources available, including the population 
and associated 90k array SNP data (Mackay 
etal. 2014) and genetic map (Gardner et al. 
2016), genome assemblies for two of the found- 
ers (Walkowiak etal. 2020), and phenotypic 
and genetic data for numerous traits including 


Mercia 

Maris Ranger ————— 

Avalon T, 
Rothwel| Perdix —— 


Champlein —— — ——3i 
= 


track the pedigree of a single recombinant inbred line 
(RIL) through the pedigree 


disease (Bouvet etal. 2022a, c; Corsi etal. 
2020; Lin et al. 2020a; Riaz et al. 2020) flow- 
ering time (Wittern et al. 2022), canopy archi- 
tecture (Zanella etal. 2022), ear architecture 
(Dixon et al. 2018), end-use quality and mineral 
content (Fradgley etal. 2022a). Additionally, 
BLAST access to the genome assembles for the 
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remaining six founders is currently available 
ahead of publication (https://www.cropdiver- 
sity.ac.uk/8magic-blast/) and release of whole 
genome skim sequencing data for the RILs is 
imminent (J Cockram personal communication). 


15.5.2 Founder Selection 


Founder choice in any structured population is 
one of the first decisions addressed and depends 
to some degree on population type. For a bi- 
parental population, founders that contrast for a 
specific trait of interest are typically selected. In 
some cases, selection criteria will also include 
selection for specific traits that may otherwise 
confound the target phenotype. For example, 
founders with similar ear emergence date may 
be selected to avoid pleiotropic effects on dis- 
eases such as Fusarium head blight that affect 
the wheat ear. However, the differential presence 
of alleles of contrasting effect between founders 
may mean that while the parents may have been 
selected for similar phenotype, segregation for 
the phenotype may still be observed in the prog- 
eny. For NAM and MAGIC populations, found- 
ers should generally be selected to maximise 
genetic diversity, particularly in those designs 
that include larger founder numbers. For NAM 
populations, the selection of the ‘linking’ founder 
is notably important as each progeny line will 
sample 50% of its genome, and its genome will 
be highly represented in the population. ‘Linking’ 
founders typically represent a line which has been 
particularly well characterised, or is common in 
the wheat pedigree within the target geographi- 
cal region. For example, the cultivar PARAGON 
has been selected as the ‘linking’ founder in three 
wheat NAM populations: Watkins-60 (Wingen 
etal. 2017), NIAB SHW and NIAB TetHex 
(data repository at https://niab.github.io/niab- 
dfw-wp3/). PARAGON is a spring UK variety 
released in 1988 which has a sequenced genome 
(Walkowiak et al. 2020), RNA sequence (RNA- 
seq) data from multiple tissues and a gamma- 
irradiated series of deletion lines (available via 
https://www.jic.ac.uk/research-impact/germ- 
plasm-resource-unit/). Similar considerations 


J. Cockram 


apply to the selection of linking founders in NCII 
population designs, although as two or more such 
founders are used, more flexibility is afforded. 

If the aim of the population is to generate 
data under field conditions, founders should 
be suited for growth in the environments under 
which they will be phenotyped. When con- 
structing populations using elite varieties, this 
should be relatively straightforward. For exam- 
ple, in the NIAB Diverse MAGIC population, 
the 16 founders were selected to sample maxi- 
mum genetic diversity across a wider collection 
of 94 European winter wheat cultivars released 
over a 70 year period, for assessment under UK 
field conditions (Scott et al. 2020a). However, 
for populations that capture variation from lan- 
drace or species related to wheat, especially if 
these donors originate from geographic areas 
distant to the target environment, adaptability of 
the resulting populations to local field environ- 
ments could be more problematic. In bi-parental 
or NAM populations, one way to address this is 
to generate populations from backcross-1 (BC,) 
generation (where each progeny line contains 
on average 25% of the non-recurrent founder 
genome) or beyond, rather than from the F, 
which is expected to contain 50% contribution 
from each founder. This approach is logisti- 
cally harder, as it involves an additional round 
of crossing and requires more progeny than 
an F,-derived population to effectively sam- 
ple non-recurrent founder genome. However, 
if the aim is to generate phenotypic data under 
field conditions, such approaches may be ben- 
eficial. For MAGIC designs, as each progeny 
line represents a balanced genomic mosaic of 
all founders, the inclusion of one, or possi- 
bly more, ‘wilder’ founder genomes is slightly 
less problematic. For example, in an 8-founder 
MAGIC which includes one 'wilder' founder, 
each RIL would be expected to contain a 1/8th 
genomic contribution from the *wilder' founder. 
While no such MAGIC populations have been 
constructed to date in wheat, the most diverse 
is the INRA MAGIC-like population devel- 
oped using one male-sterile line (cv. PROBUS) 
crossed and backcrossed with 59 European/ 
worldwide lines before 12 generation of random 
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from T: timopheevi present in the NIAB Diverse MAGIC 
founder MARIS FUNDIN, as identified by analysis of 
exome-promotor capture sequence data of the 16 found- 
ers. The introgression is visualised here by the increase 
in non-reference (relative to chromosome 2B IWGSC 


intermating to generate 1000 lines (Thépot 
et al. 2014). Finally, for all population designs, 
it may be useful to consider the size and extent 
of any genomic rearrangements (e.g. the chro- 
mosome 5AL/7AL translocation Walkowiak 
etal. 2020) or chromosomal introgressions 
from wheat relatives, as their presence is likely 
to disrupt local genetic recombination rates. 
While such regions may specifically be sought, 
for example the Ae. tauschii (D) and T. durum 
ssp. durum (AB) genomic contributions cap- 
tured in the NIAB SHW NAM, it is possible 
that one or more founders are unintentionally 
selected that contain such features. For exam- 
ple, in the 16-founder NIAB Diverse MAGIC 
population, cv. MARIS FUNDIN carries a large 
introgression of 540 Mb from T: timopheevi on 
chromosome 2B which is substantially over- 
represented in the MAGIC progeny (Fig. 15.5) 
(Scott etal. 2020a). Segregation distortion 
due to introgressions was also identified in the 
8-founder NIAB Elite MAGIC, for example due 


and as reduced sequence coverage (bottom) in MARIS 
FUNDIN, compared to the remaining 15 founders. Scott 
et al. (2020a) find the introgression to be substantially 
over-represented in the MAGIC progeny 


to the chromosome 1B/IR wheat/rye introgres- 
sion in cvs. BROMPTON and RIALTO and the 
presence of an introgression on the long arm 
of chromosome 4A in cv. ROBIGUS (Gardner 
et al. 2016). 


15.5.3 Association Mapping Panels 


The experimental populations described above 
take time to construct. However, it is possi- 
ble to exploit the genetic variation and histori- 
cal genetic recombination captured in existing 
collections of wheat varieties, landraces or 
accessions (Fig. 15.2). Such association map- 
ping approaches aim to locate QTL based on 
the strength of the association between genetic 
markers and the target trait(s) and rely on the 
decay of linkage disequilibrium between mark- 
ers and QTL over genetic distance (Cockram 
and Mackay 2018). Genetic analysis of asso- 
ciation mapping panels can be conducted using 
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markers from candidate genes, or from across 
the genome using a whole genome associa- 
tion scan (GWAS) approach. Most commonly, 
single markers are regressed against the target 
trait. However, power can be increased by con- 
structing haplotypes from the genotypic allele 
calls of two or more genetic variants that are 
closely physically or genetically linked within a 
defined region (haploblock). Use of haplotypes 
in GWAS can improve the estimation of allelic 
effects and increase statistical significance 
and is increasingly used in wheat. For exam- 
ple, linkage disequilibrium approach to defin- 
ing haploblocks in a panel of 6333 wheat lines 
genotyped with 14,027 GbyS genetic markers 
resulted in the identification of 537 genome- 
wide haploblocks for downstream GWAS of 
grain yield (Sehgal et al. 2020). Alleles present 
at a frequency of less than 5% within the panel 
will typically not be detected, even if these 
alleles have relatively high effect sizes and/ 
or the causative polymorphism is assayed. In 
human genetics, approaches that help identify 
rare alleles in GWAS are increasingly being 
used (reviewed by Lee etal. 2014), such as 
aggregation tests that evaluate cumulative effects 
of multiple genetic variants in a gene or region. 
The ability to generate experimental populations 
in plants means that such approaches are not as 
necessary to explore. 

Unlike the case in most experimental popu- 
lations in which allele frequency is relatively 
equally distributed among the progeny, asso- 
ciation mapping panels are often characterised 
by notable levels of population substructure or 
subdivision. This is due to the differences in 
the shared ancestry of the lines over time, due 
to non-random mating. In cereal crops, popula- 
tion structure commonly arises from (i) physi- 
cal separation, i.e. (geographic location), (ii) the 
contrasting germplasm preferences within dif- 
ferent breeding companies, (iii) seasonal growth 
habit (i.e. spring or winter-sown) and (iv) traits 
underlying end-use quality (such as malting 
or feed in barley, or bread making versus in 
wheat) (Cockram et al. 2010; White et al. 2022) 
and yield (Sharma etal. 2022). For example, 
while relatively few major genetic determinants 
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control the spring versus winter phenotype 
(Bentley et al. 2013), the common practice that 
spring cultivars are typically bred from other 
spring lines, while winters are bred from win- 
ters means that any genetic variants present at 
notably different frequencies between these 
two germplasm pools continue to show skews 
in their frequency in progeny lines. Thus here, 
if a favourable allele controlling a trait of inter- 
est happened to segregate predominantly in the 
spring pool, then the population structure inher- 
ent within spring varieties may lead to false- 
positive genotype-trait associations (termed 
Type-I errors) that are not due to close linkage 
of markers with the underlying QTL. It is pos- 
sible to control statistically for population struc- 
ture (Q) by using genetic markers to determine 
a Q-matrix of population membership estimates 
for each accession in the panel. Q-matrices 
can be determined using programmes such 
as STRUCTURE (Pritchard etal. 2000) or 
via principal component analysis (Zhao et al. 
2007). Additional correction for more recent 
similarities due to close kinship (K) can also be 
included and can be determined using genetic 
markers. Indeed, approaches such as the Q+K 
mixed model (Yu et al. 2006) that account for 
multiple levels of relatedness between individu- 
als have been shown to control well for false- 
positive as well as false-negative (Type-II error) 
associations and often lead to higher power than 
correction via Q or K alone (Yu etal. 2006). 
However, accounting for population structure/ 
kinship sacrifices some level of experimental 
power to detect those genetic loci that are cor- 
related with the adjustments made. Nevertheless, 
power and precision to detect genetic loci in 
association mapping panels can be high, com- 
pared to experimental populations of the same 
size. While improved power can be achieved 
by increasing the number of individuals in the 
panel, the inclusion of additional accessions 
may increase population substructure and/or 
kinship. Similarly, linkage disequilibrium may 
decay quite slowly in with genetic distance in 
cultivars (due to close kinship among all lines), 
which will reduce the precision to detect QTL 
(Cockram and Mackay 2018) but will increase 
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power. Conversely, linkage disequilibrium in 
panel’s landraces is typically higher, enabling 
greater genetic mapping precision. Genotyped 
wheat landraces collections are now avail- 
able that sample diversity with single countries 
(e.g. China, Zhou et al. 2017) or from around 
the world—such as the Watkins (Wingen et al. 
2014) and Vavilov collections (Riaz et al. 2018). 
These are beginning to be used for GWAS of 
agronomic traits, such as disease resistance (tan 
spot, Halder etal. 2019; leaf rust. Riaz et al. 
2018, stripe rust, Jambuthenne et al. 2022) and 
pre-harvest sprouting (Zhou et al. 2017). Given 
the multiple variables affecting GWAS in asso- 
ciation mapping panels, it is useful to determine 
the efficacy of experimental design by undertak- 
ing power calculations, especially if population 
size is relatively small (e.g. White et al. 2022). 


15.6 Reverse Genetics Germplasm 
Platforms 


Functional validation of genes genetically 
mapped using experimental or association 
mapping populations can be undertaken using 
reverse genetics approaches. Transgenic meth- 
ods aim to alter gene expression or function, 
typically via gene overexpression, gene silenc- 
ing or gene editing (reviewed in wheat by 
Adamski etal. 2020). Alternatively, non-trans- 
genic reverse transgenics approaches are avail- 
able that exploit genetic variation induced by 
mutagenizing agents. In wheat, the most com- 
monly used are Targeting Induced Local Lesions 
in Genomes (TILLING) populations, created by 
using an inbred donor line (termed the M, gen- 
eration) and applying the chemical agent ethyl 
methanesulphonate (EMS). The resulting EMS 
treated seed is termed the M, generation, which 
can be subsequently selfed over several gen- 
erations to generate a population of TILLING 
lines in which the EMS-generated mutations 
become progressively fixed in homozygous 
state. Bespoke experiment-specific TILLING 
populations are frequently used to determine 
genes underlying traits controlled by 
gle major effect genes, such as gene-for-gene 


sin- 
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disease resistance. In such cases, a wheat line 
for which resistance to the target disease is con- 
trolled by a single major effect locus is mutated, 
and susceptible TILLING lines identified pheno- 
typically. Assuming the underlying gene can be 
sequenced, relatively low numbers of TLLING 
lines with independent mutations at the target 
locus are generally sufficient to give a high sta- 
tistical probability of identifying the causative 
gene. For example, Sánchez-Martín et al. (2016) 
estimated that the probability that the 12 kb gene 
containing contig of their target wheat gene 
(Pm2 conferring resistance to powdery mildew) 
being mutated across all 12 identified powdery 
mildew susceptible TILLING mutants was 1 in 
300,000,000,000. Several approaches to apply- 
ing DNA sequencing to such gene identifica- 
tion approaches have been published: the first 
uses exome capture of pre-determined can- 
didate gene families (termed resistance-gene 
enrichment sequencing, RenSeq, when applied 
to NRL disease resistance gene families; Jupe 
et al. 2013). The second, termed MutChromSeq, 
involves flow sorting and direct sequencing of 
the target chromosome in each of the phenotypi- 
cally identified TILLING lines (Sánchez-Martín 
etal. 2016). In addition to such experiment- 
specific TILLING resources, exome capture 
followed by DNA sequencing of large numbers 
of TILLING lines generated from the spring 
bread wheat cv. CADENZA (1200 lines) and 
the tetraploid wheat cv. KRONOS (1535 lines) 
have been made publicly available (Krasileva 
etal. 2016). The resulting TILLING muta- 
tions have been aligned against the bread wheat 
reference genome of cv. CHINESE SPRING 
(RefSeq v1.1; IWGSC 2018) and searchable via 
the Ensembl plants (Cunningham et al. 2022) 
genome browser. The effects of mutations on 
protein sequence have been predicted in rela- 
tion to CHINESE SPRING gene models, with 
deleterious mutations determined to be present 
in around 9046 of the captured genes. The abil- 
ity to identify and prioritise TILLING mutants 
in silico means these resources serve as use- 
ful genome-wide resources for gene functional 
validation in wheat. Considerations for the 
identification and validation of wheat TILLING 
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mutants in the CADENZA and KRONOS pop- 
ulations are listed in more detail by Adamski 
et al. (2020) and include the need to combine 
TILLING mutants in multiple homoeologues 
to overcome possible functional redundancy as 
well as the need to undertake sufficient rounds 
of backcrossing to remove background muta- 
tions. Examples of their use for gene functional 
characterisation include (i) wheat candidate 
genes orthologous to map-based cloned gene 
from model species (e.g. TaGRAIN WIDTH2, 
Simmonds et al. 2016), (ii) wheat genes identi- 
fied via forward phenotypic screening followed 
by bulk segregant analysis of backcross derived 
progeny between mutant line and wild-type (e.g. 
HOMEOBOX DOMAIN-2, Dixon etal. 2022) 
and (iii) candidate genes underlying wheat 
genetic loci previously refined by fine-mapping 
(e.g. WHEAT ORTHOLOG OF APOI, Kuzay 
etal. 2022; EARLY FLOWERING 3, Wittern 
et al. 2022). While the ability to screen in silico 
the cv. CADENZA and KRONOS TILLING 
populations provide proven community 
resources for gene functional characterisation, 
they can only be used for those genes present in 
the two founding cultivars used. The availabil- 
ity of annotated genome assemblies for multiple 
wheat varieties now provides the underpinning 
knowledge from which it may in future be pos- 
sible to develop additional sequenced TILLING 
resources that target genes not captured in cv. 
CADENZA and KRONOS. 


15.7 The Future of Genetic 
Recombination 


Genetic recombination in wheat is enriched in 
the telomeric regions and becomes progres- 
sively less frequent towards the pericentromeric 
and centromeric regions, with 80% of recombi- 
nation events occurring in less than a quarter of 
the genome (e.g. Gardner et al. 2016; IWGSC 
2018). As genetic mapping relies on the occur- 
rence of recombination, being able to increase 
recombination at chromosomal regions of inter- 
est would help both genetic mapping preci- 
sion, and the ability to recombine different 
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haplotypes in breeding. Analysis of crossover 
events in RIL populations has identified QTL 
for genetic recombination frequency, such as 
a locus on chromosome 6A in the CHINESE 
SPRING x PARAGON population control- 
ling around 6% of the variation (Gardiner et al. 
2019). Further, recent work shows that recom- 
bination events in wheat pericentric regions can 
be increased in some chromosomes by increas- 
ing temperature during meiosis (Coulton et al. 
2020), although this does come with reduced 
fertility (Draeger and Moore 2017). Transgenic 
approaches for altering genetic recombination 
rates and locations are also now being inves- 
tigated. For example, transient virus induced 
gene silencing (VIGS) of wheat candidate genes 
homologous to genes in other species shown to 
control genetic recombination shows it is pos- 
sible to alter the distribution of recombination 
along chromosomes (Raz etal. 2021). VIGS 
silencing of the durum wheat homologue of the 
anti-cross over gene XRCC2 (a paralogue of 
RAD31) in F, plants ahead of meiosis resulted 
in increased genetic recombination across much 
of the pericentromeric region of chromosome 
4B, as well the more distal pericentromeric 
regions of chromosome 5B (Raz etal. 2021). 
Such results indicate that it should be possible to 
increase genetic recombination in at least some 
of the pericentromeric landscape of wheat. The 
maturation of gene editing methodologies may 
soon enable the targeting of cross-overs and 
genetic recombination to more specific genomic 
locations. 


15.8 Conclusions 


In parallel to the efforts to provide wheat 
genomic and genotyping tools, the wheat com- 
munity has generated extensive resources to sup- 
port genetic locus and gene characterisation via 
forward and reverse genetics approaches. For 
highly penetrant wheat genetic loci originat- 
ing from natural variants or via induced muta- 
tion, and where phenotype effectively acts as a 
genetic marker, various routes have been used 
to identify the underlying genetic loci, including 
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fine-mapping in bi-parental derived germ- 
plasm, as well as reverse genetics approaches 
such as RenSeq and MutChromSeq where the 
identification of multiple independent alleles 
rather that genetic recombination is required. 
For genetic loci of a more quantitative nature, 
to date it is those which account for an unusu- 
ally high proportion of genetic variation that 
have been fine-mapped or map-based cloned, 
using bi-parental populations and also more 
recently via multifounder populations. The 
vast majority of remaining heritable variation 
in the wheat genepool is much more quantita- 
tive in nature, typically accounting for 3-5% of 
the phenotypic variation. For such loci, includ- 
ing those located in genomic regions with very 
low genetic recombination, identification of the 
underlying genes and variants via forward map- 
ping approaches will continue to pose a chal- 
lenge. However, genetic mapping approaches 
will allow their alleles and linked haplotypes to 
be determined, and increasingly, for the epistatic 
non-additive interaction effects of these loci to 
be characterised. For wheat breeding, advances 
in our knowledge of genetic loci and gene func- 
tion will best be exploited within a quantitative 
genetics framework (Mackay et al. 2021). Trait 
improvement in the context of breeding over 
the next decade will likely focus on integra- 
tion of multi-trait ensemble phenotypic weight- 
ing approaches (e.g. Fradgley etal. 2022b) 
combined with improved genomic selection 
methodologies and field-based phenotyping at 
increasing throughput and precision. The next 
decade will likely also see the maturation of 
approaches to engineer increased genetic recom- 
bination, and to design via gene editing new 
alleles with improved function. Finally, com- 
puter vision, artificial intelligence and machine 
learning and approaches are now maturing to the 
point at which they can more readily be applied 
to complex challenges such as crop phenotyping 
and plant breeding. Such approaches need to be 
efficiently combined to underpin future breed- 
ing for improved crop performance, quality and 
resilience. 
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Glossary 


2n = 6x = 42, AABBDD n is the gametic chro- 
mosome number, 2n is somatic chromosome 
number. x is the basic chromosome number, 
which for wheat is 7. Bread wheat is hexa- 
ploid with 6 chromosome sets in its genome 
(6x), termed the AA, BB and DD subge- 
nomes. Thus, a somatic cell of the hexaploid 
bread wheat genome has a total of 42 chro- 
mosomes, summed across its AA BB and DD 
subgenomes. 

Advanced intercross (AIC) A bi-parental pop- 
ulations, where F2 progeny are intercrossed 
over One or more generations before the gen- 
eration of inbred lines. 

Association mapping A method for genetic 
mapping of QTL that uses historic linkage 
disequilibrium to associated phenotype to 
genetic markers. Also known as ‘linkage dis- 
equilibrium mapping’. 

Copy number variation (CNV) Differences in 
the number of copies of a particular gene or 
chromosomal region. Where there is a pres- 
ence or absence of a gene/region, it can also 
be termed presence/absence variation (PAV). 

Genetic recombination The rearrangement of 
DNA sequences by the breakage and re-join- 
ing of chromosome segments. 

Genome-wide association study (GWAS) A 
method for genetic mapping, using a collec- 
tion of varieties, landraces or lines from an 
experimental population with phenotypic and 
genome-wide genotypic datasets. 

Haplotype A set of DNA markers located suf- 
ficiently closely linked on the same chromo- 
some to be frequently inherited as a single unit. 

Linkage disequilibrium (LD) Non-random 
association of genetic markers at separate 
loci located that are typically located on the 
same chromosome. 
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Experimental population A population of lines 
created by crossing two or more founders. 

Multi-parent advanced generation intercross 
(MAGIC) Experimental populations typi- 
cally made by intercrossing 4, 8 or 16 found- 
ers over multiple generations so that the 
outputs of the crossing have contributions 
from each of the founders. Inbred lines are 
then derived by single seed descent. 

Nested association mapping (NAM) A col- 
lection of two or more bi-parental popu- 
lations, where all individual bi-parental 
populations share one founder in common 
(i.e. a single recurrent parent is used). E.g. 
Founder-1 x Founder-2, 1 x 3, 1 x 4. 

North Carolina H (NCII) model A collec- 
tion of three or more bi-parental populations, 
where any single bi-parental populations 
shares at least one founder in common 
with any other population, but where two 
or more recurrent parents are used. E.g. 
Founder-1 x Founder-2, 1 x 3, 2 x 3. 

Population substructure Presence of a system- 
atic difference in allele frequencies between 
groups of accessions, due to non-random 
mating. 

Single nucleotide polymorphism (SNP) A 
genomic variant at a single base position in a DN 
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